Azure Cognitive Services Speech SDK for Python

1.49.0 · active · verified Sat Apr 11

The Microsoft Cognitive Services Speech SDK for Python (current version 1.49.0) provides robust capabilities for integrating speech-to-text, text-to-speech, and speech translation into Python applications. It supports both real-time and non-real-time scenarios across various platforms, enabling developers to build intelligent speech-enabled features. The library maintains an active release cadence with frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic text-to-speech functionality using a neural voice. It initializes the SpeechConfig with an API key and region (retrieved from environment variables), creates a SpeechSynthesizer, and then synthesizes user-provided text to the default speaker. Ensure 'SPEECH_KEY' and 'SPEECH_REGION' environment variables are set before running.

import os
import azure.cognitiveservices.speech as speechsdk

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
# (or "ENDPOINT" for custom endpoints) to be set.
# Replace with your own subscription key and service region. Example: "westus", "eastus"
speech_key = os.environ.get('SPEECH_KEY', '')
speech_region = os.environ.get('SPEECH_REGION', '') # e.g., 'westus'

if not speech_key or not speech_region:
    print("Please set the SPEECH_KEY and SPEECH_REGION environment variables.")
    exit()

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)

# The neural multilingual voice can speak different languages based on the input text.
speech_config.speech_synthesis_voice_name='en-US-AvaMultilingualNeural'

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

print("Enter some text that you want to speak (type 'exit' to quit) >")
while True:
    text = input()
    if text.lower() == 'exit':
        break

    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print(f"Speech synthesized for text: [{text}]")
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print(f"Speech synthesis canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print(f"Error details: {cancellation_details.error_details}")
            print("Did you set the speech resource key and region environment variables correctly?")

view raw JSON →