Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Mistral Medium 3.1
Model family: mistral-medium
- llm
- proprietary
- closed-api
- mid
- long-context
- multilingual
- eu-based
- coding
- stem
Quick Take
Mistral's closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. enterprise tier — sits between Small 4 and Large 3 on capability and cost, with hybrid and on-prem deployment available but no published weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself..
Plain-English Description
Mistral Medium 3 was the first Mistral model to break the company's open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. convention. Released May 7, 2025 and updated to Medium 3.1 on August 13, 2025, the Medium tier exists for enterprise customers who want Mistral-quality models through a managed API without the operational lift of self-hosting and without needing the headline capability of Mistral Large. It's the middle of what Mistral calls a three-tier enterprise strategy: Small (self-host or cheap API), Medium (managed API only, enterprise features), Large (frontier capability, either API or demanding self-host).
The specs are deliberately modest in disclosure: 128K-tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run., text-only processing, no multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. input, and Mistral hasn't published parameter counts or architectural details. The model is proprietary and closed-weight; there is no open-sourceA stricter standard than open-weight: the weights, the training code, and the training data are all released publicly. Very few large language models meet the full open-source bar — most "open" models in the AI world are actually open-weight. When in doubt, check the license file and the creator's documentation. release planned. At $0.40 per million input tokens and $2 per million output tokens, pricing undercuts comparable capability tiers from U.S. competitors meaningfully. Mistral's own pitch at launch was that Medium 3 delivers "frontier performance at or above 90% of Claude Sonnet 3.7" — though as of April 2026, independent benchmark verification of that specific claim is thin. The Artificial AnalysisAn independent benchmarking site that runs standardized tests across commercial and open-weight models and publishes comparable results on capability, speed, and cost. Widely cited for API provider comparisons — if you want to know whether Llama 3.3 70B is faster on Groq or Together, Artificial Analysis is the reference. Intelligence Index assigns Medium 3 a composite score in line with Mistral's positioning, but granular breakdowns (MMLUA broad knowledge test covering 57 subjects from law and medicine to mathematics and history. Scores are reported as percentage correct. A score around 85% is strong for a frontier model; above 90% is state-of-the-art. MMLU is probably the most-cited benchmark in AI model comparisons, though it has known weaknesses — models can memorize the questions, and the test reflects a specific cultural and academic context.-Pro, HumanEvalA coding benchmark consisting of 164 Python programming problems. Scores reported as "pass@1" (percent solved on the first try) or "pass@10" (percent solved in at least one of 10 tries). A score of 90%+ on pass@1 is now routine for frontier code-specialized models. Useful as a signal but too simple to fully reflect real-world programming ability., SWE-bench) that developers typically use for model selection are sparse or missing.
The model's real differentiator is deployment flexibility at the enterprise tier. Medium 3.1 is available through Mistral's own API, AWS Bedrock, Azure AI Foundry, IBM WatsonX, and Google Cloud Vertex — the standard enterprise cloud distribution. More interestingly, Mistral offers hybrid and on-prem deployment arrangements for enterprise customers who need the model running in their own data center or VPC. The weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. don't leave Mistral's control, but the inferenceRunning a model to get outputs — as opposed to training it. When you send a prompt to ChatGPT, that's inference. Inference is much cheaper than training per operation but adds up quickly at scale. Pricing pages almost always refer to inference costs (per million tokens, per request, etc.), not training costs. can happen on customer infrastructure. This posture is similar to Anthropic's AWS Bedrock and Google Cloud Vertex enterprise arrangements and is mostly about procurement and data-residency rather than technical flexibility.
Best For
- Teams already using Mistral's API who want more capability than Small 4 without jumping to Large 3's cost profile. Medium 3.1 is the natural upgrade path within the Mistral API ecosystem.
- Enterprise workloads in financial services, energy, and healthcare where Mistral has existing beta-customer traction. Mistral has explicitly targeted and onboarded customers in these sectors; the model has been shaped by that feedback.
- Hybrid and on-prem deployments that need Mistral-quality models behind firewall. This is the tier where Mistral will negotiate bespoke deployment arrangements, including continuous pretrainingThe first and most expensive phase of training a model, where it learns general language and knowledge from enormous datasets — typically trillions of tokens of text scraped from the internet, books, code, and other sources. Pretraining produces a base model. Major labs spend millions to hundreds of millions of dollars on a single pretraining run. on private data.
- EU-compliant closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. workloads. For European teams that want a closed API with French jurisdiction rather than a U.S.-based vendor.
Not For
- Teams that value open weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself. for inspection, audit, or self-hosting flexibility. Mistral Medium 3.1 is closed. If openness is a requirement, reach for Mistral Small 4 or Large 3 instead.
- Workloads needing multimodalA model that can handle more than one type of input or output — typically text plus images, sometimes plus audio or video. "GPT-4 Vision" and "Llama 3.2 11B Vision" are multimodal models that accept both text and images. A text-only model is called "unimodal" but nobody uses that term; text-only is the assumed default. input. Text-only. For vision tasks, Mistral Small 4 or Large 3 are the options.
- Teams that rely heavily on independent benchmark coverage to select models. Medium 3.1 has much thinner third-party benchmark coverage than GPT-4-class or Claude-class models. Verifying performance for your specific workload requires running your own evals.
- Applications that would benefit more from reasoning-mode or extended-thinking capability. Medium 3.1 doesn't expose a configurable reasoning parameter the way Mistral Small 4 does, nor does it have a dedicated reasoning variant. For reasoning-heavy work, Small 4 with
reasoning_effort: highis often the smarter choice even at the lower capability tier.
License — Plain-English Summary
Mistral Medium 3.1 is a proprietary closed-APIA model that's only accessible through the creator's own API or product — you can't download it, run it yourself, or inspect its weights. GPT-4, Claude, and Gemini Pro are closed-API models. The tradeoff is convenience and often capability (closed-API models are frequently the strongest) versus loss of control over data, pricing, and availability. model. You pay per tokenThe basic unit of text a model reads and writes. Tokens are roughly three-quarters of a word in English — so 100 tokens is about 75 words. Models don't see letters or words directly; they see tokens. Pricing is almost always quoted per million tokens, and context windows are measured in tokens rather than words. to call Mistral's hosted APIAccessing a model by sending requests to the creator's (or a provider's) servers, typically pay-per-use. Hosted APIs handle all the operational work — scaling, hardware, uptime — in exchange for a per-token or per-request fee. Every closed-API model is hosted; many open-weight models are also available via hosted APIs from providers like Together, Fireworks, or Groq. (or one of the cloud partner deployments) and you get the right to use the model's outputs in your applications. You don't get the weightsThe numerical values inside a trained model that encode everything it has learned. A model is, functionally, a giant list of weights — tens of billions of numbers for a mid-sized model, hundreds of billions for a frontier model. "Open-weight" means those numbers are published. "Downloading the weights" means getting the actual file you'd need to run the model yourself., you can't fine-tuneA model that has been further trained on additional data to specialize it for a particular task, domain, or style. Fine-tuning a general model on medical literature produces a medical specialist; fine-tuning on your company's support tickets produces a support assistant that sounds like your team. Fine-tunes are much cheaper to create than training a model from scratch. the base modelA model straight out of pretraining, before any fine-tuning for chat or specific tasks. Base models predict the next token but don't follow instructions well — they'll continue your prompt rather than respond to it. Most people never use base models directly; they use the instruct-tuned or chat versions built on top. Useful mostly for researchers and people doing their own fine-tuning. outside of Mistral's enterprise program, and you can't redistribute the model. For most API consumers this is a normal arrangement — the same terms govern GPT, Claude, and Gemini API usage.
How It Compares
- vs. Mistral Small 4 — Small 4 is open-weightA model where the trained weights are freely downloadable — you can run it yourself without contacting the creator. Llama, Mistral, Qwen, and Gemma are open-weight. Open-weight does not mean open-source: the training data and code often stay private. The license still governs what you can do with the weights, including whether you can use them commercially. under Apache 2.0 and cheaper ($0.15 vs $0.40 input). Medium 3.1 is proprietary but is the tier where Mistral offers hybrid deployment and enterprise fine-tuning. For most teams, Small 4 is where you should start; Medium 3.1 is where you end up if Small 4's ceiling doesn't hold for your workload and you want Mistral-grade models without Large 3's self-hosting footprint.
- vs. Mistral Large 3 Instruct — Large 3 is the capability tier and is open-weight. Medium 3.1 is less capable but has meaningfully more enterprise flexibility (hybrid, VPC, negotiated fine-tuning) and is easier to adopt for teams without GPUThe specialized chip that runs most AI models. Originally designed for 3D graphics, GPUs turned out to be excellent at the math AI requires. Nvidia dominates the AI GPU market; common datacenter models include the H100, H200, and B200. Running an AI model without a GPU is possible but painfully slow for anything but the smallest models. infrastructure.
- vs. Claude Sonnet or GPT-4-class proprietary models — Similar tier, but Medium 3.1 is priced aggressively against U.S. competitors and carries EU-jurisdiction benefits that matter for some regulated deployments. Claude and GPT have meaningfully better independent benchmark coverage and mature tooling ecosystems.
Under the Hood
Mistral has not published detailed architectural or training specifications for the Medium tier. What's known from official documentation: 131K context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run., text-only, API-only access, and a knowledge cutoff of June 30, 2025 for Medium 3.1. The composite Intelligence Index score from Artificial AnalysisAn independent benchmarking site that runs standardized tests across commercial and open-weight models and publishes comparable results on capability, speed, and cost. Widely cited for API provider comparisons — if you want to know whether Llama 3.3 70B is faster on Groq or Together, Artificial Analysis is the reference. suggests a model competitive with Llama 4 Maverick and Cohere Command A in capability tier, which aligns roughly with Mistral's "90% of Claude Sonnet 3.7" positioning.
No public version history or changelog is maintained beyond the 3.0 → 3.1 generation bump. This is unusual for an actively-marketed enterprise model in 2026 — OpenAI and Anthropic both publish detailed version notes — and represents a visibility gap that matters for enterprise procurement where version governance is a line-item concern.
Cost
- API input (per 1M tokens)
- $0.40
- API output (per 1M tokens)
- $2.00
- API providers
- mistral, openrouter, bedrock, azure, watsonx
- Notes
- Mistral's hosted API is the primary access path. Also available on AWS Bedrock, Azure AI Foundry, IBM WatsonX, and Google Cloud Vertex. Enterprise customers can negotiate fine-tuning agreements and hybrid on-prem deployments with Mistral directly, but the model weights are never released.