Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Catalog entry last reviewed 91 days ago.

Devstral Small 2 24B Instruct

Model family: devstral

Size

mid (24.0B params)

Context

262,144 tokens

Released

2025-12-09

Openness

open-weight

License

Apache 2.0 · commercial: yes

Cost tier

mixed

Rating

4.5 ★ — Strongest open-weight coding model in its size class, genuinely laptop-deployable, Apache 2.0, and competitive with much larger proprietary coding models on SWE-Bench. The rare case where "local, private, good" actually all hold at once.

Modalities

image-input, text

Capabilities

coding, function-calling, instruction-following, long-context, tool-use, vision

Access

api-first-party, api-third-party, local-runtime-llama-cpp, local-runtime-lm-studio, local-runtime-ollama, local-runtime-vllm, weights-download-direct, weights-download-hf

llm
open-weight
commercial-friendly
mid
long-context
coding
agentic
laptop-friendly
vision
eu-based
apache-licensed

Quick Take

Mistral's laptop-class coding specialist — 24B parameters, Apache 2.0, runs on a single consumer GPU, and beats 70B-class competitors on software-engineering benchmarks.

Plain-English Description

Devstral is Mistral's coding specialist family, and Devstral Small 2 24B Instruct is the version most teams should actually care about. The "Devstral 2" flagship at 123B parameters gets most of the benchmark headlines — it hits 72.2% on SWE-Bench Verified — but it carries a custom "modified MIT" license with commercial restrictions that limit its deployability. Devstral Small 2 24B is the cleanly-licensed Apache 2.0 option: smaller, still very capable, and specifically engineered to run on hardware you already have.

The sizing isn't accidental. Mistral explicitly built Devstral Small 2 to fit on a single RTX 4090 or a MacBook with 32GB of unified memory. That's a meaningful product decision for coding models because code is sensitive — teams often can't, or won't, ship source code out to third-party APIs for inference. With Devstral Small 2 quantized to 4-bit GGUF, you can run an entirely local coding agent on a developer's own laptop without any code leaving the device. At 68% on SWE-Bench Verified, that local deployment gives you coding capability that was proprietary-only a year earlier.

Devstral is purpose-built for agentic coding, not just code completion. The model is trained and instruction-tuned to operate tool-using software-engineering agents — exploring codebases across many files, running terminal commands, making coherent multi-file edits, recovering from errors, and holding long plans in context across hundreds of tool calls. Mistral recommends the OpenHands scaffolding and ships a companion CLI called Mistral Vibe for terminal-based development workflows. The 256K-token context window is specifically there to support whole-repository reasoning. Compared to IDE-completion-focused coding models (Codestral, the original one), Devstral is explicitly aimed at autonomous coding agents that do real engineering work over long sessions.

Best For

Private, local-first coding assistants for enterprise developers. Run it on developer laptops or a shared internal GPU. Code never leaves the organization.
Agentic coding workflows requiring long-context reasoning. Whole-repository edits, multi-file refactors, autonomous bug-fix agents — the 256K context window and the agentic post-training are there for this.
Cost-optimized hosted coding APIs. At $0.10 input / $0.30 output, Devstral Small 2 on Mistral's API is among the cheapest capable coding models available. For high-volume coding agent deployments where token cost matters, this is the economics play.
Teams who want an Apache 2.0 coding model. The cleanly-licensed alternative to Devstral 2 (123B, custom license) and to proprietary U.S. coding APIs. Modify, fine-tune, redistribute without friction.
Fine-tuning on proprietary codebases. The combination of open weights, permissive license, and manageable size (24B fits on a single node for full fine-tuning) makes it the practical choice for teams wanting to specialize a coding model on their own code.

Not For

Absolute top-tier SWE-Bench performance. The 123B Devstral 2 flagship scores 72.2% vs Devstral Small 2's 68%. If you need the highest benchmark number and can accept the custom license's commercial restrictions, the larger flagship is stronger. For most teams, the 4-point gap is worth the license clarity.
General-purpose chat, reasoning, or vision tasks. Devstral Small 2 has vision and general language capability but is specifically post-trained for coding. For mixed workloads, Mistral Small 4 (which absorbs Devstral's coding capability into a general-purpose model) is a better default.
Teams without engineering scaffolding. Devstral is designed to operate inside agentic scaffolds like OpenHands, Kilo Code, Aider, or Mistral Vibe. Using it as a naked chat model without tool-use orchestration leaves most of its capability on the table.
Extremely constrained hardware (less than 16GB VRAM). At aggressive quantization Devstral Small 2 can run on 12GB, but performance degrades. For truly small hardware, reach for Ministral 3 8B or 3B instead.

License — Plain-English Summary

Apache 2.0. Commercial use allowed, modifications allowed, redistribution allowed, include the license file. No conditions, no revenue caps, no special terms. This is the permissive Devstral. Do not confuse with the larger 123B Devstral 2 flagship, which uses a different license ("modified MIT") with commercial use restrictions tied to revenue — a meaningfully different legal posture.

How It Compares

vs. Devstral 2 (123B) — The 123B flagship is more capable on SWE-Bench (72.2% vs 68%) but carries a custom "modified MIT" license that restricts commercial use above a revenue threshold. For any commercial deployment where license clarity matters, Devstral Small 2 is the better starting point.
vs. Mistral Small 4 — Small 4 is a general-purpose model that absorbs Devstral's coding capability plus reasoning, vision, and agentic behavior. If you need mixed workloads, Small 4 is better. If you specifically need a coding-focused model with its full capability weighted toward software engineering, Devstral Small 2 is the specialist.
vs. Qwen 3 Coder Flash (30B) — Mistral claims Devstral Small 2 outperforms Qwen 3 Coder Flash on agentic coding benchmarks despite being smaller. Both are Apache 2.0. Close competitors; evaluate on your own workload.

Under the Hood

Devstral Small 2 is a 24B-parameter dense transformer post-trained for agentic software engineering. Architecturally it shares Ministral 3's structure with rope-scaling (inspired by Llama 4) and scalable-softmax attention. The 256K context window uses attention optimizations to avoid quadratic blow-up. The model supports Mistral's function-calling format natively and is compatible with OpenHands, Mistral Vibe, Kilo Code, Aider, and Cline as agentic scaffolds.

Benchmark performance as of launch: 68.0% on SWE-Bench Verified (real-world GitHub issues), noted by Hugging Face's Head of Product as potentially "the new local coding king." On the Trelis and Unsloth community evaluations, Devstral Small 2 generalizes well to fine-tuning and retains its agentic behavior through LoRA training when the audio tower is frozen.

Available on Mistral's API as devstral-small-2, on Hugging Face as mistralai/Devstral-Small-2-24B-Instruct-2512, and as GGUF quantizations via unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF and similar community releases for direct llama.cpp / Ollama / LM Studio use.

Cost

Self-hosted cost: $0.00 beyond compute
API input (per 1M tokens): $0.10
API output (per 1M tokens): $0.30
API providers: mistral, openrouter, fireworks
Notes: Same API pricing as Mistral Small 3.1 per Mistral's launch positioning. Self-hosting is free beyond compute costs; runs on a single RTX 4090 at Q4 quantization or on a 32GB Mac.

Pricing data is 91 days old. Verify with the source before relying on it.

Hardware requirements

Min VRAM: 16 GB
Recommended VRAM: 48 GB
Runs on laptop: Yes
Notes: Q4-quantized GGUF runs comfortably on a single consumer GPU (RTX 4090, RTX 3090, or similar). Full BF16 precision needs ~48GB VRAM. 32GB unified-memory Apple Silicon Macs handle it through llama.cpp / LM Studio.