{"id":1830,"library":"faster-whisper","title":"Faster Whisper","description":"Faster Whisper is a re-implementation of OpenAI's Whisper model using CTranslate2, which allows for faster inference and reduced memory usage. It is highly optimized for CPU and GPU, supporting various compute types. The current version is 1.2.1, with an active release cadence, frequently adding new features, model support, and performance improvements.","status":"active","version":"1.2.1","language":"en","source_language":"en","source_url":"https://github.com/SYSTRAN/faster-whisper","tags":["whisper","speech-to-text","audio","transcription","ai","ctranslate2","inference"],"install":[{"cmd":"pip install faster-whisper","lang":"bash","label":"Basic installation"},{"cmd":"pip install faster-whisper[vad,audio]","lang":"bash","label":"With VAD and audio file support (PyAV)"}],"dependencies":[{"reason":"Core dependency for faster inference, specific versions can impact CUDA/CPU compatibility.","package":"ctranslate2","optional":false},{"reason":"Required for transcribing common audio file formats (e.g., MP3, WAV).","package":"PyAV","optional":true},{"reason":"Required for Voice Activity Detection (VAD) via Silero-VAD.","package":"onnxruntime","optional":true}],"imports":[{"symbol":"WhisperModel","correct":"from faster_whisper import WhisperModel"}],"quickstart":{"code":"from faster_whisper import WhisperModel\nimport os\n\n# Ensure you have an audio file named 'audio.mp3' in the current directory\n# For example, download a short audio clip or record one.\n# Example: https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3\n\nmodel_size = os.environ.get('WHISPER_MODEL_SIZE', 'tiny.en') # e.g., 'large-v3', 'medium', 'tiny.en'\n\n# Run on CPU with INT8 compute type for general compatibility\n# For GPU, change device='cuda' and compute_type='float16' if supported\nmodel = WhisperModel(model_size, device='cpu', compute_type='int8')\n\n# Transcribe the audio file\n# Replace 'audio.mp3' with the path to your audio file\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5)\n\nprint(f\"Detected language '{info.language}' with probability {info.language_probability:.2f}\")\n\nfor segment in segments:\n    print(f\"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}\")\n","lang":"python","description":"Demonstrates loading a Whisper model and transcribing an audio file. The model will automatically download from Hugging Face Hub if not already cached. Uses CPU by default for broad compatibility; change `device` and `compute_type` for GPU acceleration."},"warnings":[{"fix":"Ensure your CUDA toolkit and CTranslate2 version are compatible. If on older CUDA, consider installing a specific CTranslate2 version (e.g., `pip install ctranslate2<4.0`) or using a `faster-whisper` version prior to 1.0.0.","message":"Version 1.0.0 upgraded CTranslate2 to v4.0, which added support for CUDA 12. Users on older CUDA versions (e.g., CUDA 11.x) might face compatibility issues and need to downgrade CTranslate2 or use a compatible `faster-whisper` version.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"If using v1.1.0, review VAD parameter names when upgrading. For versions 1.1.1 and later, refer to the documentation for the established VAD parameter names, which were restored to their pre-1.1.0 state.","message":"In version 1.1.0, some Voice Activity Detection (VAD) parameters were renamed. However, this change was reverted in version 1.1.1. If you implemented VAD parameter tuning with v1.1.0, your code might break when upgrading to v1.1.1 or later due to the reversion to original names.","severity":"breaking","affected_versions":"1.1.0"},{"fix":"Upgrade to `faster-whisper` v1.1.1 or newer, which includes fixes for VAD-related OOM errors. Monitor memory usage, especially when enabling VAD or using batched inference, and adjust VAD parameters or batch sizes if necessary.","message":"Older versions (prior to 1.1.1) and certain VAD configurations could lead to high RAM usage and Out-Of-Memory (OOM) errors, particularly with longer audio files or larger batch sizes.","severity":"gotcha","affected_versions":"<1.1.1"},{"fix":"Upgrade to `faster-whisper` v1.2.1 or newer to ensure correct behavior of `clip_timestamps` and `suppress_tokens` (including `<|nocaptions|>`) during batched inference. Always test batched inference with your specific use case.","message":"When using batched inference, specific issues regarding `clip_timestamps` and the `<|nocaptions|>` token were fixed in version 1.2.1. In earlier versions, these features might not have behaved as expected in batched mode, potentially leading to incorrect timestamp merging or token suppression.","severity":"gotcha","affected_versions":"<1.2.1"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}