mlx-whisper

0.4.3 verified Fri May 01 auth: no python

mlx-whisper is a fast implementation of OpenAI's Whisper model optimized for Apple Silicon (M1/M2/M3) using Apple's MLX framework. Version 0.4.3 supports transcription, language detection, and word-level timestamps with Hugging Face Hub integration. Updates are frequent, roughly weekly.

pip install mlx-whisper

Common errors

error ModuleNotFoundError: No module named 'mlx' ↓

cause MLX is not installed or running on non-Apple Silicon hardware.

fix

Install MLX with pip install mlx and ensure you are on Apple Silicon (e.g., M1, M2, M3).

error ValueError: Audio file '<path>' is not a valid audio file ↓

cause The file may be corrupted, unsupported format, or path is incorrect.

fix

Ensure the file exists and is one of the supported formats: .mp3, .wav, .m4a, .flac, .ogg. Use a proper file path string.

error mlx.core.index_error: Index 0 is out of bounds for axis 0 with size 0 ↓

cause The audio file might be empty or the model failed to load properly.

fix

Re-download the model (delete ~/.cache/huggingface/hub/models--mlx-community--whisper-* and run again) or check the audio file.

Warnings

gotcha mlx-whisper only works on Apple Silicon (arm64). It will fail on Intel Macs or non-Apple hardware with cryptic import errors about missing MLX. ↓

fix Run on an Apple Silicon machine (M1/M2/M3/M4) or use a different Whisper implementation.

gotcha The audio file path must be a local file or a URL. Some users incorrectly pass a bytes/ndarray object; transcribe expects a file path string. ↓

fix Provide a valid file path (e.g., 'audio.mp3') or download the file first.

breaking In version 0.4.0, the return type of transcribe() changed from a dict with key 'segments' to a more detailed dict with 'text', 'segments', 'language', etc. Old code accessing e.g. result['segments'][0]['text'] still works, but result['text'] is now available. ↓

fix Update code to use the new keys. Old code is backward-compatible but you may want to use the streamlined result['text'].

deprecated The function `mlx_whisper.transcribe()` no longer accepts `group_segments=True` (removed in 0.3.0). Pass `word_timestamps=True` for word-level timestamps. ↓

fix Use `word_timestamps=True` instead of `group_segments=True`.

Imports

load_audio
wrong
```
from mlx_whisper.audio import load_audio
```
correct
```
from mlx_whisper import load_audio
```
load_audio is now exposed at the top level; submodule import may break.
transcribe
```
from mlx_whisper import transcribe
```
Standard API.

Quickstart

Transcribe an audio file (supported formats: .mp3, .wav, .m4a, .flac, .ogg).

from mlx_whisper import transcribe
result = transcribe("audio.mp3")
print(result["text"])