← Back to hard AIs

Verify critical details — pricing, licensing, availability — with the model's source before business decisions. Full methodology →

Models · Mistral AI

Voxtral Mini 4B Realtime

Model family: voxtral

Streaming-ASR Voxtral — processes live audio incrementally for real-time transcription and voice agents. Apache 2.0.

Listing Notes

This is the streaming-transcription member of the Voxtral family, architecturally distinct from the batch-oriented Voxtral Small 24B and Voxtral Mini Transcribe V2. Instead of requiring complete audio segments (padded to 30-second chunks inherited from the original Whisper encoderThe part of a model that reads input and converts it into an internal numerical representation the model can work with. In a translation model, the encoder reads the English sentence; the decoder produces the French. Modern chat models like GPT and Llama are "decoder-only" — they skip the separate encoder step. design), this model processes audio incrementally as it arrives. The practical use case is real-time voice agents — customer-service bots that respond mid-utterance, live captioning for video streams, and dictation applications where words appear as they're spoken. For batch transcription of complete recordings (podcasts, meetings), Voxtral Mini Transcribe V2 is cheaper and higher-quality. For true real-time streaming, this is the purpose-built option.

Identity

Creator
Mistral AI
Model family
voxtral
Release date
2026-02-17

Technical specs

Parameter count
4B
Context window
16K tokens
Modalities
  • Audio Input
  • Text
Primary capabilities
  • Multilingual
  • Speech To Text

License

License
Apache 2.0
Commercial use
  • Allowed
Terms
  • Modification
  • Redistribution
  • Attribution

Access

Openness
  • Open Weight
Access methods
  • Api First Party
  • Local Runtime Vllm
  • Weights Download Hf
Cost tier
  • Mixed

Full model card →