Google Cloud Text-to-Speech
The `google-cloud-texttospeech` client library for Python enables seamless integration with the Google Cloud Text-to-Speech API. It allows developers to convert text into natural-sounding speech using Google's advanced AI technologies, supporting a wide range of voices, languages, and customization options. As of version 2.35.0, it continues to be actively maintained with frequent updates as part of the broader Google Cloud Python client libraries.
Warnings
- breaking Google Cloud Text-to-Speech voices are periodically updated or replaced. Existing voice names in your application might be deprecated or behave differently, potentially causing unexpected audio output or errors.
- gotcha Incorrect authentication setup is a common issue. If `GOOGLE_APPLICATION_CREDENTIALS` is not set or Application Default Credentials (ADC) are not configured, the client will fail to connect to the API.
- gotcha Using an invalid `language_code`, `name`, or `ssml_gender` in `VoiceSelectionParams` can lead to `InvalidArgument` errors.
- gotcha The minimum supported Python version for `google-cloud-texttospeech` is `3.9`. Using older Python versions can lead to installation issues or runtime errors.
Install
-
pip install google-cloud-texttospeech
Imports
- TextToSpeechClient
from google.cloud import texttospeech
- enums and types
from google.cloud import texttospeech_v1 as texttospeech
Quickstart
import os
from google.cloud import texttospeech_v1 as texttospeech
def synthesize_text(text):
"""Synthesizes speech from the input string of text.
Ensure GOOGLE_APPLICATION_CREDENTIALS environment variable is set
or authenticate using `gcloud auth application-default login`.
"""
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text=text)
# Build the voice request, select the language code ('en-US') and the SSML voice gender ('NEUTRAL')
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Perform the text-to-speech request
response = client.synthesize_speech(
input=synthesis_input,
voice=voice,
audio_config=audio_config
)
# The response's audio_content is binary. Write it to a file.
output_filename = "output.mp3"
with open(output_filename, "wb") as out:
out.write(response.audio_content)
print(f'Audio content written to file "{output_filename}"')
if __name__ == "__main__":
# For local development, set the GOOGLE_APPLICATION_CREDENTIALS environment variable.
# For example: os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/path/to/your/key.json'
# Or use `gcloud auth application-default login`
if not os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'):
print("Warning: GOOGLE_APPLICATION_CREDENTIALS not set. Assuming gcloud ADC is configured.")
text_to_synthesize = "Hello, world! This is a test of the Google Cloud Text-to-Speech API."
synthesize_text(text_to_synthesize)