Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Devstral 2 (123B)
Model family: devstral
Mistral's 123B agentic coding flagship — 72.2% on SWE-Bench Verified — with a custom "modified MIT" license that has commercial-use restrictions tied to revenue. Powerful but legally non-trivial to deploy commercially.
Listing Notes
The license on Devstral 2 (123B) is the single most important thing to understand before evaluating this model. Mistral's announcement described it as "modified MIT," but as developers on X correctly flagged, the practical effect is closer to a proprietary license with MIT-style attribution: commercial use above a revenue threshold requires a separate paid agreement with Mistral. This is a fundamentally different license posture from Devstral Small 2 (24B), which is clean Apache 2.0. If you're choosing between the two for a commercial product, Devstral Small 2 is almost always the correct default — the license clarity is worth more than the 4-percentage-point SWE-Bench difference. Use Devstral 2 (123B) when the additional capability genuinely matters AND your deployment either falls below the revenue threshold OR you have a commercial license. On capability: this is genuinely the strongest open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. agentic coding model available. On infrastructure: the 123B dense architecture needs a multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. node (not laptop-feasible).
Identity
- Creator
- Mistral AI
- Model family
- devstral
- Release date
- 2025-12-09
Technical specs
- Parameter count
- 123B
- Context window
- 262K tokens
- Modalities
- Image Input
- Text
- Primary capabilities
- Coding
- Function Calling
- Instruction Following
- Long Context
- Tool Use
- Vision
License
- License
- Devstral 2 Community License (Mistral 'Modified MIT')
- Commercial use
- Conditional
Mistral calls this license "modified MIT" in their announcements, but developers have flagged the characterization as misleading — it is functionally closer to a proprietary license with MIT-style attribution requirements. Commercial use is CONDITIONAL: the license restricts revenue-generating commercial deployment above a threshold defined by Mistral. Exceeding the revenue threshold requires a separate commercial agreement with Mistral. This is a materially different license posture from the Apache 2.0 covering Devstral Small 2 (24B); do not assume the two Devstral 2 releases share the same terms. READ THE LICENSE TEXT CAREFULLY before deploying Devstral 2 in a revenue-generating product, and consider consulting counsel if your deployment approaches any meaningful scale. For commercial deployments without license ambiguity, Devstral Small 2 (24B) under Apache 2.0 is the cleaner alternative.
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api First Party
- Api Third Party
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
- llm
- open-weight
- frontier
- long-context
- coding
- agentic
- vision
- eu-based
- custom-license
- commercially-restricted