Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Catalog entry last reviewed 91 days ago.

Voxtral Mini 4B Realtime

Model family: voxtral

Streaming-ASR Voxtral — processes live audio incrementally for real-time transcription and voice agents. Apache 2.0.

Listing Notes

This is the streaming-transcription member of the Voxtral family, architecturally distinct from the batch-oriented Voxtral Small 24B and Voxtral Mini Transcribe V2. Instead of requiring complete audio segments (padded to 30-second chunks inherited from the original Whisper encoder design), this model processes audio incrementally as it arrives. The practical use case is real-time voice agents — customer-service bots that respond mid-utterance, live captioning for video streams, and dictation applications where words appear as they're spoken. For batch transcription of complete recordings (podcasts, meetings), Voxtral Mini Transcribe V2 is cheaper and higher-quality. For true real-time streaming, this is the purpose-built option.

Identity

Creator: Mistral AI
Model family: voxtral
Release date: 2026-02-17

Technical specs

Parameter count

Context window

16K tokens

Modalities

Audio Input
Text

Primary capabilities

Multilingual
Speech To Text

License

License

Apache 2.0

Commercial use

Allowed

Terms

Modification ✓
Redistribution ✓
Attribution ✓

Access

Openness

Open Weight

Access methods

Api First Party
Local Runtime Vllm
Weights Download Hf

Cost tier

Mixed

Sources

Full model card →

audio
speech-to-text
streaming
real-time
multilingual
open-weight
commercial-friendly
apache-licensed
eu-based