Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →
Voxtral Mini 4B Realtime
Model family: voxtral
Streaming-ASR Voxtral — processes live audio incrementally for real-time transcription and voice agents. Apache 2.0.
Listing Notes
This is the streaming-transcription member of the Voxtral family, architecturally distinct from the batch-oriented Voxtral Small 24B and Voxtral Mini Transcribe V2. Instead of requiring complete audio segments (padded to 30-second chunks inherited from the original Whisper encoderThe part of a model that reads input and converts it into an internal numerical representation the model can work with. In a translation model, the encoder reads the English sentence; the decoder produces the French. Modern chat models like GPT and Llama are "decoder-only" — they skip the separate encoder step. design), this model processes audio incrementally as it arrives. The practical use case is real-time voice agents — customer-service bots that respond mid-utterance, live captioning for video streams, and dictation applications where words appear as they're spoken. For batch transcription of complete recordings (podcasts, meetings), Voxtral Mini Transcribe V2 is cheaper and higher-quality. For true real-time streaming, this is the purpose-built option.
Identity
- Creator
- Mistral AI
- Model family
- voxtral
- Release date
- 2026-02-17
Technical specs
- Parameter count
- 4B
- Context window
- 16K tokens
- Modalities
- Audio Input
- Text
- Primary capabilities
- Multilingual
- Speech To Text
License
- License
- Apache 2.0
- Commercial use
- Allowed
- Terms
- Modification ✓
- Redistribution ✓
- Attribution ✓
Access
- Openness
- Open Weight
- Access methods
- Api First Party
- Local Runtime Vllm
- Weights Download Hf
- Cost tier
- Mixed
- audio
- speech-to-text
- streaming
- real-time
- multilingual
- open-weight
- commercial-friendly
- apache-licensed
- eu-based