OpenAI Whisper

20250625 · active · verified Sat Apr 11

OpenAI Whisper is a general-purpose automatic speech recognition (ASR) model, developed by OpenAI. It is trained on a large dataset of diverse audio and is capable of multilingual speech recognition, speech translation, and language identification. Releases are somewhat irregular, with multiple updates typically published each year, often in dated version formats (e.g., YYYYMMDD).

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a Whisper model and transcribe an audio file. It will download the chosen model on the first run. Ensure you have FFmpeg installed on your system for audio file processing. A dummy audio file is generated if `scipy` is available, otherwise, please provide your own.

import whisper
import os

# Ensure you have an audio file, e.g., 'audio.mp3'
# For demonstration, let's create a dummy file if it doesn't exist
dummy_audio_path = 'dummy_audio.mp3'
if not os.path.exists(dummy_audio_path):
    # This is just a placeholder, in a real scenario you'd use a valid audio file
    print(f"Please ensure a valid audio file named '{dummy_audio_path}' exists for transcription.")
    # Example: you might download a small audio file here
    # For real use, replace this with your actual audio file.
    # For this quickstart, we'll assume it exists or the user provides one.
    # To make it runnable for testing, let's create a minimal WAV (requires scipy)
    try:
        from scipy.io.wavfile import write
        import numpy as np
        samplerate = 16000  # 16 kHz
        duration = 1.0     # 1 second
        frequency = 440    # A4 note
        t = np.linspace(0., duration, int(samplerate * duration), endpoint=False)
        amplitude = np.iinfo(np.int16).max * 0.5
        data = amplitude * np.sin(2. * np.pi * frequency * t)
        write(dummy_audio_path, samplerate, data.astype(np.int16))
        print(f"Created a dummy audio file: {dummy_audio_path}")
    except ImportError:
        print("scipy not found. Cannot create dummy audio. Please provide your own audio.mp3.")
        exit()

# Load a Whisper model (e.g., 'base', 'small', 'medium', 'large')
# 'tiny' or 'base' are good for quick tests, 'large' for best accuracy.
# The model will be downloaded on first use.
print("Loading Whisper model...")
model = whisper.load_model("base") # You can choose 'tiny', 'base', 'small', 'medium', 'large'

# Transcribe the audio file
print(f"Transcribing {dummy_audio_path}...")
result = model.transcribe(dummy_audio_path)

# Print the transcription
print("Transcription:")
print(result["text"])

view raw JSON →