{"id":3189,"library":"openai-whisper","title":"OpenAI Whisper","description":"OpenAI Whisper is a general-purpose automatic speech recognition (ASR) model, developed by OpenAI. It is trained on a large dataset of diverse audio and is capable of multilingual speech recognition, speech translation, and language identification. Releases are somewhat irregular, with multiple updates typically published each year, often in dated version formats (e.g., YYYYMMDD).","status":"active","version":"20250625","language":"en","source_language":"en","source_url":"https://github.com/openai/whisper","tags":["speech-to-text","audio-processing","AI","transcription","machine-learning","deep-learning"],"install":[{"cmd":"pip install -U openai-whisper","lang":"bash","label":"Install latest stable release"}],"dependencies":[{"reason":"Required for audio processing (system-level dependency).","package":"ffmpeg","optional":false},{"reason":"Deep learning framework for model execution.","package":"torch","optional":false},{"reason":"Fast tokenizer implementation by OpenAI.","package":"tiktoken","optional":false},{"reason":"Numerical computing.","package":"numpy","optional":false},{"reason":"Progress bars.","package":"tqdm","optional":false},{"reason":"Utilities for iterables.","package":"more-itertools","optional":false},{"reason":"JIT compiler for numerical functions.","package":"numba","optional":false},{"reason":"GPU programming (Linux x86_64 only), installed for performance.","package":"triton","optional":true}],"imports":[{"symbol":"whisper","correct":"import whisper"}],"quickstart":{"code":"import whisper\nimport os\n\n# Ensure you have an audio file, e.g., 'audio.mp3'\n# For demonstration, let's create a dummy file if it doesn't exist\ndummy_audio_path = 'dummy_audio.mp3'\nif not os.path.exists(dummy_audio_path):\n    # This is just a placeholder, in a real scenario you'd use a valid audio file\n    print(f\"Please ensure a valid audio file named '{dummy_audio_path}' exists for transcription.\")\n    # Example: you might download a small audio file here\n    # For real use, replace this with your actual audio file.\n    # For this quickstart, we'll assume it exists or the user provides one.\n    # To make it runnable for testing, let's create a minimal WAV (requires scipy)\n    try:\n        from scipy.io.wavfile import write\n        import numpy as np\n        samplerate = 16000  # 16 kHz\n        duration = 1.0     # 1 second\n        frequency = 440    # A4 note\n        t = np.linspace(0., duration, int(samplerate * duration), endpoint=False)\n        amplitude = np.iinfo(np.int16).max * 0.5\n        data = amplitude * np.sin(2. * np.pi * frequency * t)\n        write(dummy_audio_path, samplerate, data.astype(np.int16))\n        print(f\"Created a dummy audio file: {dummy_audio_path}\")\n    except ImportError:\n        print(\"scipy not found. Cannot create dummy audio. Please provide your own audio.mp3.\")\n        exit()\n\n# Load a Whisper model (e.g., 'base', 'small', 'medium', 'large')\n# 'tiny' or 'base' are good for quick tests, 'large' for best accuracy.\n# The model will be downloaded on first use.\nprint(\"Loading Whisper model...\")\nmodel = whisper.load_model(\"base\") # You can choose 'tiny', 'base', 'small', 'medium', 'large'\n\n# Transcribe the audio file\nprint(f\"Transcribing {dummy_audio_path}...\")\nresult = model.transcribe(dummy_audio_path)\n\n# Print the transcription\nprint(\"Transcription:\")\nprint(result[\"text\"])","lang":"python","description":"This quickstart demonstrates how to load a Whisper model and transcribe an audio file. It will download the chosen model on the first run. Ensure you have FFmpeg installed on your system for audio file processing. A dummy audio file is generated if `scipy` is available, otherwise, please provide your own."},"warnings":[{"fix":"Install FFmpeg on your system. Verify its installation by running `ffmpeg -version` in your terminal.","message":"FFmpeg is a critical system-level dependency for `openai-whisper` to process audio files. The Python package installation does not include FFmpeg itself. You must install it separately using your operating system's package manager (e.g., `sudo apt install ffmpeg` on Debian/Ubuntu, `brew install ffmpeg` on macOS).","severity":"gotcha","affected_versions":"All versions"},{"fix":"If you intend to use OpenAI's hosted Whisper API, you need to install `pip install openai` and follow their API documentation. This library is for running the model locally.","message":"The `openai-whisper` library (this package) is distinct from the Whisper API offered by OpenAI (which uses the `openai` Python client library). The APIs and usage patterns are different. This registry entry pertains to the open-source `openai-whisper` library for local model execution.","severity":"breaking","affected_versions":"All versions"},{"fix":"Choose a smaller model size (e.g., 'tiny', 'base', 'small') if you have limited resources. Ensure your system meets the memory requirements, or consider using optimized Whisper variants like `Faster Whisper` or `whisper.cpp` for lower resource consumption.","message":"Whisper models, especially larger ones ('medium', 'large'), require significant CPU RAM and/or GPU VRAM. The `large` model can require 10GB or more of VRAM for inference, and even more RAM in addition, making it challenging for systems without powerful GPUs. Running multiple instances in parallel requires proportionally more resources.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you have a Rust compiler and C++ build tools installed. For Windows, install 'Desktop development with C++' workload from Visual Studio Installer. For macOS, install Xcode Command Line Tools (`xcode-select --install`).","message":"Installation issues can occur, particularly if the `tiktoken` dependency fails to build. `tiktoken` requires a Rust compiler and associated build tools on your system. For Windows, this often means installing Microsoft Visual C++ Build Tools.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider using the `no_speech_prob` parameter (if available in your wrapper) to filter outputs, pre-process audio to reduce noise, or use prompting techniques to guide the model towards specific vocabulary or style. For API users, ensure audio file sizes are within limits (e.g., typically <25MB for OpenAI API).","message":"Whisper models can sometimes 'hallucinate' or produce irrelevant transcriptions, especially with silent audio segments, noisy input, or ambiguous speech. They may also struggle with specific jargon or heavy accents.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For translation tasks, use multilingual models such as `tiny`, `base`, `small`, `medium`, or `large`. The `medium` or `large` models are generally recommended for the best translation accuracy.","message":"The `turbo` model, while fast, is primarily optimized for English transcription and is not designed for translation tasks. Using `--task translate` with the `turbo` model will not yield translation results; it will return the original language.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}