Mistral AI releases its first open-source AI audio model

Nigar Sultanli

Mistral

16 July 2025 11:15

271 views

Mistral AI releases its first open-source AI audio model

French AI startup Mistral has unveiled Voxtral, its first open-source audio model designed as an alternative to closed corporate systems for speech recognition and voice-driven operations.

Voxtral is the first family of open models aimed at business users, described by Mistral as offering “truly usable speech intelligence.” Previously, developers had to choose between cheap but limited systems that struggled with transcription and understanding, or expensive closed platforms with less deployment control. Voxtral solves this by providing an affordable and flexible solution.

The model can transcribe up to 30 minutes of audio, and thanks to its LLM-based Mistral Small 3.1 architecture, it can understand up to 40 minutes of audio content, allowing users to ask questions about the content, generate summaries, or execute voice commands in real time. Voxtral supports multiple languages including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.

Mistral offers two main versions: Voxtral Small with 24 billion parameters for large-scale deployments, and Voxtral Mini with 3 billion parameters optimized for local and edge devices. There is also a cheaper and faster API version called Voxtral Mini Transcribe, focused solely on transcription and promising better performance than OpenAI’s Whisper at less than half the cost.

Users can try Voxtral for free by downloading the API on the Hugging Face platform or testing the models via Mistral’s chatbot, Le Chat. API integration starts at $0.001 per minute.

Voxtral follows last month’s release of Magistral, Mistral’s family of reasoning models designed for step-by-step problem solving. Known as one of Europe’s leading AI firms, Mistral advocates for open-source AI models and is currently in talks to raise up to $1 billion in funding from investors.

This launch opens new opportunities in the open-source AI ecosystem for audio applications, enabling companies to build more affordable, transparent, and powerful voice-based solutions.