mlx-whisper

raw JSON →
0.4.3 verified Fri May 01 auth: no python

mlx-whisper is a fast implementation of OpenAI's Whisper model optimized for Apple Silicon (M1/M2/M3) using Apple's MLX framework. Version 0.4.3 supports transcription, language detection, and word-level timestamps with Hugging Face Hub integration. Updates are frequent, roughly weekly.

pip install mlx-whisper
error ModuleNotFoundError: No module named 'mlx'
cause MLX is not installed or running on non-Apple Silicon hardware.
fix
Install MLX with pip install mlx and ensure you are on Apple Silicon (e.g., M1, M2, M3).
error ValueError: Audio file '<path>' is not a valid audio file
cause The file may be corrupted, unsupported format, or path is incorrect.
fix
Ensure the file exists and is one of the supported formats: .mp3, .wav, .m4a, .flac, .ogg. Use a proper file path string.
error mlx.core.index_error: Index 0 is out of bounds for axis 0 with size 0
cause The audio file might be empty or the model failed to load properly.
fix
Re-download the model (delete ~/.cache/huggingface/hub/models--mlx-community--whisper-* and run again) or check the audio file.
gotcha mlx-whisper only works on Apple Silicon (arm64). It will fail on Intel Macs or non-Apple hardware with cryptic import errors about missing MLX.
fix Run on an Apple Silicon machine (M1/M2/M3/M4) or use a different Whisper implementation.
gotcha The audio file path must be a local file or a URL. Some users incorrectly pass a bytes/ndarray object; transcribe expects a file path string.
fix Provide a valid file path (e.g., 'audio.mp3') or download the file first.
breaking In version 0.4.0, the return type of transcribe() changed from a dict with key 'segments' to a more detailed dict with 'text', 'segments', 'language', etc. Old code accessing e.g. result['segments'][0]['text'] still works, but result['text'] is now available.
fix Update code to use the new keys. Old code is backward-compatible but you may want to use the streamlined result['text'].
deprecated The function `mlx_whisper.transcribe()` no longer accepts `group_segments=True` (removed in 0.3.0). Pass `word_timestamps=True` for word-level timestamps.
fix Use `word_timestamps=True` instead of `group_segments=True`.

Transcribe an audio file (supported formats: .mp3, .wav, .m4a, .flac, .ogg).

from mlx_whisper import transcribe
result = transcribe("audio.mp3")
print(result["text"])