Google Cloud Speech-to-Text Python Client

2.38.0 · active · verified Sat Mar 28

The `google-cloud-speech` Python client library provides seamless integration with the Google Cloud Speech-to-Text API. It allows developers to convert audio to text using powerful neural network models, supporting various languages and audio formats. Currently at version 2.38.0, the library is actively maintained with frequent releases, often monthly or bi-monthly, ensuring ongoing improvements and new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to transcribe a local audio file using the Google Cloud Speech-to-Text client library. It covers client instantiation, reading audio content, configuring recognition settings, and processing the transcription response. Ensure you have a Google Cloud project with the Speech-to-Text API enabled and your `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to a service account key file with appropriate permissions.

import os
from google.cloud import speech

# Set the path to your service account key file
# This is typically done via the GOOGLE_APPLICATION_CREDENTIALS environment variable.
# For local testing, you might set it in code (not recommended for production).
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/keyfile.json"

def transcribe_audio(audio_file_path):
    client = speech.SpeechClient()

    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    try:
        response = client.recognize(config=config, audio=audio)
        for result in response.results:
            print(f"Transcript: {result.alternatives[0].transcript}")
    except Exception as e:
        print(f"An error occurred: {e}")

# Example usage (replace with your actual audio file)
if __name__ == "__main__":
    # Make sure you have an audio file named 'audio.wav' (16-bit, 16000 Hz, mono WAV)
    # and that GOOGLE_APPLICATION_CREDENTIALS is set up.
    # For testing, create a dummy WAV file or use a real one.
    # e.g., using `scipy.io.wavfile.write('audio.wav', 16000, np.zeros(16000, dtype=np.int16))`
    # Or, for a real test, ensure you have a small audio.wav file.
    # You must have a service account key file and set the GOOGLE_APPLICATION_CREDENTIALS
    # environment variable pointing to it, or pass credentials explicitly.
    # e.g., export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"

    # This example assumes a 'test.wav' file exists in the same directory
    # and is a LINEAR16 (16-bit PCM), 16000 Hz, mono WAV file.
    # Create a dummy file for demonstration if needed:
    # import numpy as np
    # from scipy.io.wavfile import write as write_wav
    # write_wav('test.wav', 16000, np.zeros(16000, dtype=np.int16))

    # Placeholder for a real audio file path
    # In a real scenario, ensure this file exists and is correctly formatted.
    # For this quickstart, you might use a short, simple WAV file.
    audio_test_file = "test.wav"
    print(f"Attempting to transcribe: {audio_test_file}")
    print("Ensure GOOGLE_APPLICATION_CREDENTIALS is set and the file exists and is LINEAR16, 16000 Hz, mono.")
    transcribe_audio(audio_test_file)

view raw JSON →