← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Mistral AI

Devstral 2 (123B)

Model family: devstral

Mistral's 123B agentic coding flagship — 72.2% on SWE-Bench Verified — with a custom "modified MIT" license that has commercial-use restrictions tied to revenue. Powerful but legally non-trivial to deploy commercially.

Listing Notes

The license on Devstral 2 (123B) is the single most important thing to understand before evaluating this model. Mistral's announcement described it as "modified MIT," but as developers on X correctly flagged, the practical effect is closer to a proprietary license with MIT-style attribution: commercial use above a revenue threshold requires a separate paid agreement with Mistral. This is a fundamentally different license posture from Devstral Small 2 (24B), which is clean Apache 2.0. If you're choosing between the two for a commercial product, Devstral Small 2 is almost always the correct default — the license clarity is worth more than the 4-percentage-point SWE-Bench difference. Use Devstral 2 (123B) when the additional capability genuinely matters AND your deployment either falls below the revenue threshold OR you have a commercial license. On capability: this is genuinely the strongest open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. agentic coding model available. On infrastructure: the 123B dense architecture needs a multi-GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. node (not laptop-feasible).

Identity

Creator
Mistral AI
Model family
devstral
Release date
2025-12-09

Technical specs

Parameter count
123B
Context window
262K tokens
Modalities
  • Image Input
  • Text
Primary capabilities
  • Coding
  • Function Calling
  • Instruction Following
  • Long Context
  • Tool Use
  • Vision

License

License
Devstral 2 Community License (Mistral 'Modified MIT')
Commercial use
  • Conditional

Mistral calls this license "modified MIT" in their announcements, but developers have flagged the characterization as misleading — it is functionally closer to a proprietary license with MIT-style attribution requirements. Commercial use is CONDITIONAL: the license restricts revenue-generating commercial deployment above a threshold defined by Mistral. Exceeding the revenue threshold requires a separate commercial agreement with Mistral. This is a materially different license posture from the Apache 2.0 covering Devstral Small 2 (24B); do not assume the two Devstral 2 releases share the same terms. READ THE LICENSE TEXT CAREFULLY before deploying Devstral 2 in a revenue-generating product, and consider consulting counsel if your deployment approaches any meaningful scale. For commercial deployments without license ambiguity, Devstral Small 2 (24B) under Apache 2.0 is the cleaner alternative.

Terms
  • Modification
  • Redistribution
  • Attribution

Access

Openness
  • Open Weight
Access methods
  • Api First Party
  • Api Third Party
  • Local Runtime Vllm
  • Weights Download Hf
Cost tier
  • Mixed

Full model card →