{"id":1718,"library":"speechrecognition","title":"SpeechRecognition","description":"SpeechRecognition is a comprehensive Python library for performing speech recognition. It supports various engines and APIs, both online (e.g., Google Web Speech API, Google Cloud Speech, OpenAI Whisper API, AWS Transcribe, Microsoft Azure Speech, Cohere Transcribe) and offline (e.g., CMU Sphinx, Vosk, Whisper via local models). It is actively maintained with frequent minor and patch releases, currently at version 3.16.0.","status":"active","version":"3.16.0","language":"en","source_language":"en","source_url":"https://github.com/Uberi/speech_recognition#readme","tags":["speech-to-text","voice-recognition","audio-processing","ai","ml","transcription"],"install":[{"cmd":"pip install SpeechRecognition","lang":"bash","label":"Basic installation"},{"cmd":"pip install SpeechRecognition pyaudio","lang":"bash","label":"With Microphone input support"},{"cmd":"pip install SpeechRecognition vosk","lang":"bash","label":"With Vosk (offline) recognition support"},{"cmd":"pip install SpeechRecognition openai","lang":"bash","label":"With OpenAI Whisper API support"}],"dependencies":[{"reason":"Required for real-time microphone input (sr.Microphone). Also needs system-level PortAudio library.","package":"PyAudio","optional":true},{"reason":"Required for some offline processing or specific recognizers (e.g., Google Web Speech API for local files on some systems). Needs system-level FLAC binaries.","package":"FLAC","optional":true},{"reason":"Required for offline CMU Sphinx recognition (sr.recognize_sphinx). Needs system-level PocketSphinx library.","package":"pocketsphinx","optional":true},{"reason":"Required for offline Vosk recognition (sr.recognize_vosk). Also needs specific language model downloads.","package":"vosk","optional":true},{"reason":"Required for OpenAI Whisper API recognition (sr.recognize_whisper_api).","package":"openai","optional":true},{"reason":"Required for local Whisper model recognition (sr.recognize_whisper).","package":"whisper","optional":true},{"reason":"Used in the quickstart example to create a dummy audio file. Not strictly required by SpeechRecognition itself.","package":"pydub","optional":true}],"imports":[{"symbol":"Recognizer","correct":"import speech_recognition as sr\nr = sr.Recognizer()"},{"symbol":"Microphone","correct":"import speech_recognition as sr\nmic = sr.Microphone()"},{"symbol":"AudioFile","correct":"import speech_recognition as sr\naudio_file = sr.AudioFile('path/to/file.wav')"},{"symbol":"UnknownValueError","correct":"from speech_recognition import UnknownValueError"},{"symbol":"RequestError","correct":"from speech_recognition import RequestError"}],"quickstart":{"code":"import speech_recognition as sr\nimport os\n\nr = sr.Recognizer()\n\n# --- Option 1: Listen from Microphone (requires PyAudio and PortAudio) ---\ntry:\n    import pyaudio\n    with sr.Microphone() as source:\n        print(\"Say something into the microphone!\")\n        r.adjust_for_ambient_noise(source, duration=1) # Adjust for ambient noise\n        audio = r.listen(source, timeout=5, phrase_time_limit=10)\n    print(\"Processing microphone input...\")\n    text = r.recognize_google(audio)\n    print(f\"You said (Google Web Speech): {text}\")\nexcept sr.WaitTimeoutError:\n    print(\"No speech detected within the timeout period for microphone.\")\nexcept sr.UnknownValueError:\n    print(\"Google Web Speech Recognition could not understand microphone audio.\")\nexcept sr.RequestError as e:\n    print(f\"Could not request results from Google Web Speech service for microphone; {e}\")\nexcept ImportError:\n    print(\"PyAudio not installed. Cannot use microphone. To enable, install with: pip install pyaudio\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred with microphone input: {e}\")\n\n# --- Option 2: Transcribe an Audio File (e.g., using Google Web Speech API) ---\nfile_path = \"dummy_audio.wav\"\n# Create a dummy WAV file for demonstration if it doesn't exist\nif not os.path.exists(file_path):\n    try:\n        from pydub import AudioSegment\n        AudioSegment.silent(duration=1000, frame_rate=16000).export(file_path, format=\"wav\")\n        print(f\"\\nCreated a dummy WAV file: {file_path}\")\n    except ImportError:\n        print(\"\\npydub not installed, cannot create dummy audio. Please provide a WAV file manually.\")\n        print(\"Skipping audio file transcription example.\")\n        file_path = None\n\nif file_path:\n    try:\n        with sr.AudioFile(file_path) as source:\n            audio = r.record(source)  # Read the entire audio file\n        print(f\"Transcribing '{file_path}'...\")\n        text = r.recognize_google(audio)\n        print(f\"Transcription (Google Web Speech): {text}\")\n    except sr.UnknownValueError:\n        print(f\"Google Web Speech Recognition could not understand audio from '{file_path}'.\")\n    except sr.RequestError as e:\n        print(f\"Could not request results from Google Web Speech service for '{file_path}'; {e}\")\n    except Exception as e:\n        print(f\"An error occurred with audio file transcription: {e}\")\n\n# --- Option 3: Using a Commercial API (e.g., OpenAI Whisper API) ---\n# Requires 'pip install openai' and setting OPENAI_API_KEY environment variable\nOPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\", \"\")\nif OPENAI_API_KEY and file_path:\n    print(\"\\nAttempting transcription with OpenAI Whisper API...\")\n    try:\n        with sr.AudioFile(file_path) as source:\n            audio = r.record(source)\n        text = r.recognize_whisper_api(audio, api_key=OPENAI_API_KEY)\n        print(f\"Transcription (OpenAI Whisper API): {text}\")\n    except sr.UnknownValueError:\n        print(f\"OpenAI Whisper API could not understand audio from '{file_path}'.\")\n    except sr.RequestError as e:\n        print(f\"Could not request results from OpenAI Whisper API service; {e}\")\n    except Exception as e:\n        print(f\"An error occurred with OpenAI Whisper API: {e}\")\nelse:\n    print(\"\\nSkipping OpenAI Whisper API example (OPENAI_API_KEY not set or no audio file for transcription).\")\n","lang":"python","description":"This quickstart demonstrates how to transcribe audio using the SpeechRecognition library. It includes a runnable microphone input example (with graceful degradation if PyAudio is not installed) and an example for transcribing from an audio file. For the audio file example, it attempts to create a dummy WAV file using `pydub` if available, otherwise, it expects a manual WAV file. It uses the free Google Web Speech API for transcription. A third option for using a commercial API (OpenAI Whisper) is also included, requiring an API key and additional installation."},"warnings":[{"fix":"Upgrade your Python environment to 3.9 or higher.","message":"SpeechRecognition version 3.x and later requires Python 3.9 or newer. Older Python 3.x versions (e.g., 3.6-3.8) and Python 2 are no longer supported.","severity":"breaking","affected_versions":"<3.9"},{"fix":"Consult the official documentation for specific installation instructions for your chosen recognizer and input method (e.g., `pip install pyaudio`, system-level `portaudio` development headers, `pip install vosk`, etc.).","message":"Many speech recognition features (e.g., microphone input, specific offline recognizers like PocketSphinx/Vosk) require additional system-level libraries (e.g., PortAudio for PyAudio, FLAC binaries) or Python packages that are not installed by default with `pip install SpeechRecognition`.","severity":"gotcha","affected_versions":"All"},{"fix":"Obtain an API key from the respective service provider and provide it when calling the recognition method (e.g., `recognize_google_cloud(audio, credentials_json=YOUR_KEY)` or `recognize_whisper_api(audio, api_key=os.environ.get('OPENAI_API_KEY'))`). The `recognize_google` method is free for limited use without an explicit key.","message":"Commercial APIs (e.g., Google Cloud Speech, OpenAI Whisper API, AWS Transcribe, Microsoft Azure Speech, Cohere Transcribe) require API keys, which are typically passed as an argument or loaded from environment variables. These services are not free and incur costs.","severity":"gotcha","affected_versions":"All"},{"fix":"Follow the documentation for your chosen offline engine to download and specify the correct model path (e.g., `model = vosk.Model('path/to/model')` for Vosk, or using the `sprc download vosk` CLI command introduced in 3.14.4).","message":"Offline recognition engines like Vosk and PocketSphinx require separate language model downloads, which can be large (hundreds of MBs to several GBs). These models are not included with the Python package installation.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}