Google Cloud Speech-to-Text Python Client
The `google-cloud-speech` Python client library provides seamless integration with the Google Cloud Speech-to-Text API. It allows developers to convert audio to text using powerful neural network models, supporting various languages and audio formats. Currently at version 2.38.0, the library is actively maintained with frequent releases, often monthly or bi-monthly, ensuring ongoing improvements and new features.
Warnings
- breaking The Speech-to-Text V2 API is not a drop-in replacement for V1. It features a modernized interface, new features, and different pricing. Existing V1 code will require modification to use V2.
- gotcha The most common error is `DefaultCredentialsError`, indicating that the client cannot find valid authentication credentials.
- gotcha Incorrect audio file encoding, sample rate, or format (e.g., trying to transcribe an MP3 with `LINEAR16` config) will lead to transcription errors or poor results. For files stored in Google Cloud Storage, the URI must be in `gs://bucket-name/object-name` format.
- gotcha Streaming transcription for longer audio (especially for certain non-English languages) may encounter intermittent failures around the 4-minute mark due to internal streaming limits or processing complexities.
- gotcha When using streaming recognition with `interim_results=True` in the V2 API, the `responses_iterator` might block until all requests are done instead of yielding results immediately, which can be unexpected for real-time applications.
Install
-
pip install google-cloud-speech
Imports
- SpeechClient
from google.cloud import speech
Quickstart
import os
from google.cloud import speech
# Set the path to your service account key file
# This is typically done via the GOOGLE_APPLICATION_CREDENTIALS environment variable.
# For local testing, you might set it in code (not recommended for production).
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/keyfile.json"
def transcribe_audio(audio_file_path):
client = speech.SpeechClient()
with open(audio_file_path, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
try:
response = client.recognize(config=config, audio=audio)
for result in response.results:
print(f"Transcript: {result.alternatives[0].transcript}")
except Exception as e:
print(f"An error occurred: {e}")
# Example usage (replace with your actual audio file)
if __name__ == "__main__":
# Make sure you have an audio file named 'audio.wav' (16-bit, 16000 Hz, mono WAV)
# and that GOOGLE_APPLICATION_CREDENTIALS is set up.
# For testing, create a dummy WAV file or use a real one.
# e.g., using `scipy.io.wavfile.write('audio.wav', 16000, np.zeros(16000, dtype=np.int16))`
# Or, for a real test, ensure you have a small audio.wav file.
# You must have a service account key file and set the GOOGLE_APPLICATION_CREDENTIALS
# environment variable pointing to it, or pass credentials explicitly.
# e.g., export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
# This example assumes a 'test.wav' file exists in the same directory
# and is a LINEAR16 (16-bit PCM), 16000 Hz, mono WAV file.
# Create a dummy file for demonstration if needed:
# import numpy as np
# from scipy.io.wavfile import write as write_wav
# write_wav('test.wav', 16000, np.zeros(16000, dtype=np.int16))
# Placeholder for a real audio file path
# In a real scenario, ensure this file exists and is correctly formatted.
# For this quickstart, you might use a short, simple WAV file.
audio_test_file = "test.wav"
print(f"Attempting to transcribe: {audio_test_file}")
print("Ensure GOOGLE_APPLICATION_CREDENTIALS is set and the file exists and is LINEAR16, 16000 Hz, mono.")
transcribe_audio(audio_test_file)