Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Catalog entry last reviewed 91 days ago.

Mistral Small 4 Eagle

Model family: mistral-small

Eagle speculative-decoding head for Mistral Small 4 — pair it with the base model for faster inference throughput. Architectural extension, not standalone.

Listing Notes

This isn't a standalone model — it's an Eagle-architecture speculative-decoding head designed to accelerate inference on Mistral Small 4. Speculative decoding works by having a small fast model predict several draft tokens ahead of the main model, which the main model then verifies or rejects in parallel. The net effect is higher tokens-per-second throughput on the same hardware. Catalogued as a separate listing (rather than collapsed into Mistral Small 4's access_methods) because it's an architectural extension with its own checkpoint, not a quantization of the base weights. Pair with mistralai/Mistral-Small-4-119B-2603 and use via vLLM's speculative decoding support.

Identity

Creator: Mistral AI
Model family: mistral-small
Release date: 2026-04-07

Technical specs

Parameter count

Small speculative-decoding head (typically hundreds of millions of parameters) designed to predict draft tokens ahead of the main Mistral Small 4 model. Not usable standalone — must be paired with the base Small 4 checkpoint.

Context window

262K tokens

Modalities

Image Input
Text

Primary capabilities

Chat
Instruction Following

License

License

Apache 2.0

Commercial use

Allowed

Terms

Modification ✓
Redistribution ✓
Attribution ✓

Access

Openness

Open Weight

Access methods

Local Runtime Vllm
Weights Download Hf

Cost tier

Self Hosted Only

Sources

Full model card →

llm
open-weight
commercial-friendly
inference-acceleration
speculative-decoding
eu-based
apache-licensed
architectural-extension