SpeechRecognition

3.16.0 · active · verified Thu Apr 09

SpeechRecognition is a comprehensive Python library for performing speech recognition. It supports various engines and APIs, both online (e.g., Google Web Speech API, Google Cloud Speech, OpenAI Whisper API, AWS Transcribe, Microsoft Azure Speech, Cohere Transcribe) and offline (e.g., CMU Sphinx, Vosk, Whisper via local models). It is actively maintained with frequent minor and patch releases, currently at version 3.16.0.

Warnings

breaking SpeechRecognition version 3.x and later requires Python 3.9 or newer. Older Python 3.x versions (e.g., 3.6-3.8) and Python 2 are no longer supported.
Fix: Upgrade your Python environment to 3.9 or higher.
gotcha Many speech recognition features (e.g., microphone input, specific offline recognizers like PocketSphinx/Vosk) require additional system-level libraries (e.g., PortAudio for PyAudio, FLAC binaries) or Python packages that are not installed by default with `pip install SpeechRecognition`.
Fix: Consult the official documentation for specific installation instructions for your chosen recognizer and input method (e.g., `pip install pyaudio`, system-level `portaudio` development headers, `pip install vosk`, etc.).
gotcha Commercial APIs (e.g., Google Cloud Speech, OpenAI Whisper API, AWS Transcribe, Microsoft Azure Speech, Cohere Transcribe) require API keys, which are typically passed as an argument or loaded from environment variables. These services are not free and incur costs.
Fix: Obtain an API key from the respective service provider and provide it when calling the recognition method (e.g., `recognize_google_cloud(audio, credentials_json=YOUR_KEY)` or `recognize_whisper_api(audio, api_key=os.environ.get('OPENAI_API_KEY'))`). The `recognize_google` method is free for limited use without an explicit key.
gotcha Offline recognition engines like Vosk and PocketSphinx require separate language model downloads, which can be large (hundreds of MBs to several GBs). These models are not included with the Python package installation.
Fix: Follow the documentation for your chosen offline engine to download and specify the correct model path (e.g., `model = vosk.Model('path/to/model')` for Vosk, or using the `sprc download vosk` CLI command introduced in 3.14.4).

Install

pip install SpeechRecognition Basic installation
pip install SpeechRecognition pyaudio With Microphone input support
pip install SpeechRecognition vosk With Vosk (offline) recognition support
pip install SpeechRecognition openai With OpenAI Whisper API support

Imports

Recognizer

import speech_recognition as sr
r = sr.Recognizer()

Microphone

import speech_recognition as sr
mic = sr.Microphone()

AudioFile

import speech_recognition as sr
audio_file = sr.AudioFile('path/to/file.wav')

UnknownValueError

from speech_recognition import UnknownValueError

RequestError

from speech_recognition import RequestError

Quickstart

This quickstart demonstrates how to transcribe audio using the SpeechRecognition library. It includes a runnable microphone input example (with graceful degradation if PyAudio is not installed) and an example for transcribing from an audio file. For the audio file example, it attempts to create a dummy WAV file using `pydub` if available, otherwise, it expects a manual WAV file. It uses the free Google Web Speech API for transcription. A third option for using a commercial API (OpenAI Whisper) is also included, requiring an API key and additional installation.

import speech_recognition as sr
import os

r = sr.Recognizer()

# --- Option 1: Listen from Microphone (requires PyAudio and PortAudio) ---
try:
    import pyaudio
    with sr.Microphone() as source:
        print("Say something into the microphone!")
        r.adjust_for_ambient_noise(source, duration=1) # Adjust for ambient noise
        audio = r.listen(source, timeout=5, phrase_time_limit=10)
    print("Processing microphone input...")
    text = r.recognize_google(audio)
    print(f"You said (Google Web Speech): {text}")
except sr.WaitTimeoutError:
    print("No speech detected within the timeout period for microphone.")
except sr.UnknownValueError:
    print("Google Web Speech Recognition could not understand microphone audio.")
except sr.RequestError as e:
    print(f"Could not request results from Google Web Speech service for microphone; {e}")
except ImportError:
    print("PyAudio not installed. Cannot use microphone. To enable, install with: pip install pyaudio")
except Exception as e:
    print(f"An unexpected error occurred with microphone input: {e}")

# --- Option 2: Transcribe an Audio File (e.g., using Google Web Speech API) ---
file_path = "dummy_audio.wav"
# Create a dummy WAV file for demonstration if it doesn't exist
if not os.path.exists(file_path):
    try:
        from pydub import AudioSegment
        AudioSegment.silent(duration=1000, frame_rate=16000).export(file_path, format="wav")
        print(f"\nCreated a dummy WAV file: {file_path}")
    except ImportError:
        print("\npydub not installed, cannot create dummy audio. Please provide a WAV file manually.")
        print("Skipping audio file transcription example.")
        file_path = None

if file_path:
    try:
        with sr.AudioFile(file_path) as source:
            audio = r.record(source)  # Read the entire audio file
        print(f"Transcribing '{file_path}'...")
        text = r.recognize_google(audio)
        print(f"Transcription (Google Web Speech): {text}")
    except sr.UnknownValueError:
        print(f"Google Web Speech Recognition could not understand audio from '{file_path}'.")
    except sr.RequestError as e:
        print(f"Could not request results from Google Web Speech service for '{file_path}'; {e}")
    except Exception as e:
        print(f"An error occurred with audio file transcription: {e}")

# --- Option 3: Using a Commercial API (e.g., OpenAI Whisper API) ---
# Requires 'pip install openai' and setting OPENAI_API_KEY environment variable
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "")
if OPENAI_API_KEY and file_path:
    print("\nAttempting transcription with OpenAI Whisper API...")
    try:
        with sr.AudioFile(file_path) as source:
            audio = r.record(source)
        text = r.recognize_whisper_api(audio, api_key=OPENAI_API_KEY)
        print(f"Transcription (OpenAI Whisper API): {text}")
    except sr.UnknownValueError:
        print(f"OpenAI Whisper API could not understand audio from '{file_path}'.")
    except sr.RequestError as e:
        print(f"Could not request results from OpenAI Whisper API service; {e}")
    except Exception as e:
        print(f"An error occurred with OpenAI Whisper API: {e}")
else:
    print("\nSkipping OpenAI Whisper API example (OPENAI_API_KEY not set or no audio file for transcription).")

view raw JSON →