whisper-timestamped

raw JSON →
1.15.9 verified Fri May 01 auth: no python

Multi-lingual Automatic Speech Recognition (ASR) based on OpenAI Whisper models, with accurate word-level timestamps, language detection confidence, Voice Activity Detection (VAD) options, and more. Current version: 1.15.9. Release cadence: irregular.

pip install whisper-timestamped
error ModuleNotFoundError: No module named 'whisper_timestamped'
cause Attempting to import after installing 'whisper-timestamped' with a typo in the import statement or missing install.
fix
Ensure you run 'pip install whisper-timestamped' and use 'import whisper_timestamped' (underscores, not hyphens).
error AttributeError: module 'whisper' has no attribute 'load_model'
cause Importing from the wrong module (e.g., 'import whisper' instead of 'import whisper_timestamped'). The standard 'whisper' package does not have the same load_model signature.
fix
Use 'from whisper_timestamped import load_model' or 'import whisper_timestamped as whisper'.
error KeyError: 'words'
cause Trying to access word timestamps from result but using standard OpenAI whisper's transcribe instead of whisper-timestamped's transcribe. The standard result does not contain 'words' keys.
fix
Use 'from whisper_timestamped import transcribe' and call transcribe with the model and audio.
error RuntimeError: No available backend for audio loading. Install soundfile or librosa.
cause Missing audio loading library. whisper-timestamped uses soundfile or librosa under the hood.
fix
Install required backend: 'pip install soundfile' or 'pip install librosa'.
gotcha The library imports as 'whisper_timestamped' (with underscore) not 'whispertimestamped'. Typing 'pip install whisper-timestamped' but importing 'whisper_timestamped' is correct. Do not import from 'whisper' as that is OpenAI's official package.
fix Use 'import whisper_timestamped' or 'import whisper_timestamped as whisper'.
deprecated The function 'transcribe_timestamped' is deprecated but still works. New code should use 'transcribe' instead.
fix Replace 'transcribe_timestamped' with 'transcribe'.
gotcha VAD (Voice Activity Detection) requires installing optional dependencies: 'pip install whisper-timestamped[all]'. Without VAD, the library uses fixed 30-second window alignment which may be less accurate.
fix Install with extra: 'pip install whisper-timestamped[all]'.
gotcha When using 'model = whisper.load_model("large")', ensure you have sufficient GPU memory (≥8GB) or use CPU with increased memory. Loading a model larger than available memory may cause silent crashes.
fix Use a smaller model like 'base' or 'small' if memory is limited.
breaking In version 1.15.0, the transcribe function signature changed: 'language' parameter now expects a string (e.g., 'en') instead of language code integer. Passing 'language=0' no longer works.
fix Use language code strings: 'language="en"', 'language="fr"', or set to None for auto-detection.

Load a Whisper model, transcribe an audio file, and print word timestamps.

import whisper_timestamped as whisper

model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
result = whisper.transcribe(model, audio, language="en")

# Access word-level timestamps
for segment in result["segments"]:
    for word in segment["words"]:
        print(f"{word['text']}: {word['start']:.2f} - {word['end']:.2f}")