mlx-whisper
raw JSON → 0.4.3 verified Fri May 01 auth: no python
mlx-whisper is a fast implementation of OpenAI's Whisper model optimized for Apple Silicon (M1/M2/M3) using Apple's MLX framework. Version 0.4.3 supports transcription, language detection, and word-level timestamps with Hugging Face Hub integration. Updates are frequent, roughly weekly.
pip install mlx-whisper Common errors
error ModuleNotFoundError: No module named 'mlx' ↓
cause MLX is not installed or running on non-Apple Silicon hardware.
fix
Install MLX with
pip install mlx and ensure you are on Apple Silicon (e.g., M1, M2, M3). error ValueError: Audio file '<path>' is not a valid audio file ↓
cause The file may be corrupted, unsupported format, or path is incorrect.
fix
Ensure the file exists and is one of the supported formats: .mp3, .wav, .m4a, .flac, .ogg. Use a proper file path string.
error mlx.core.index_error: Index 0 is out of bounds for axis 0 with size 0 ↓
cause The audio file might be empty or the model failed to load properly.
fix
Re-download the model (delete ~/.cache/huggingface/hub/models--mlx-community--whisper-* and run again) or check the audio file.
Warnings
gotcha mlx-whisper only works on Apple Silicon (arm64). It will fail on Intel Macs or non-Apple hardware with cryptic import errors about missing MLX. ↓
fix Run on an Apple Silicon machine (M1/M2/M3/M4) or use a different Whisper implementation.
gotcha The audio file path must be a local file or a URL. Some users incorrectly pass a bytes/ndarray object; transcribe expects a file path string. ↓
fix Provide a valid file path (e.g., 'audio.mp3') or download the file first.
breaking In version 0.4.0, the return type of transcribe() changed from a dict with key 'segments' to a more detailed dict with 'text', 'segments', 'language', etc. Old code accessing e.g. result['segments'][0]['text'] still works, but result['text'] is now available. ↓
fix Update code to use the new keys. Old code is backward-compatible but you may want to use the streamlined result['text'].
deprecated The function `mlx_whisper.transcribe()` no longer accepts `group_segments=True` (removed in 0.3.0). Pass `word_timestamps=True` for word-level timestamps. ↓
fix Use `word_timestamps=True` instead of `group_segments=True`.
Imports
- load_audio wrong
from mlx_whisper.audio import load_audiocorrectfrom mlx_whisper import load_audio - transcribe
from mlx_whisper import transcribe
Quickstart
from mlx_whisper import transcribe
result = transcribe("audio.mp3")
print(result["text"])