{"id":2876,"library":"azure-cognitiveservices-speech","title":"Azure Cognitive Services Speech SDK for Python","description":"The Microsoft Cognitive Services Speech SDK for Python (current version 1.49.0) provides robust capabilities for integrating speech-to-text, text-to-speech, and speech translation into Python applications. It supports both real-time and non-real-time scenarios across various platforms, enabling developers to build intelligent speech-enabled features. The library maintains an active release cadence with frequent updates.","status":"active","version":"1.49.0","language":"en","source_language":"en","source_url":"https://github.com/Azure-Samples/cognitive-services-speech-sdk","tags":["Azure","Speech","AI","Cognitive Services","Text-to-Speech","Speech-to-Text","Speech Translation"],"install":[{"cmd":"pip install azure-cognitiveservices-speech","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required for Windows platforms for the underlying native SDK components.","package":"Microsoft Visual C++ Redistributable for Visual Studio 2015-2022","optional":true},{"reason":"Required for certain Linux distributions (e.g., Ubuntu, Debian) for SSL and audio functionality.","package":"libssl1.0.0 or libssl1.0.2, libasound2","optional":true}],"imports":[{"symbol":"speechsdk","correct":"import azure.cognitiveservices.speech as speechsdk"},{"symbol":"SpeechConfig","correct":"speechsdk.SpeechConfig"},{"symbol":"AudioConfig","correct":"speechsdk.audio.AudioOutputConfig"},{"symbol":"SpeechSynthesizer","correct":"speechsdk.SpeechSynthesizer"},{"symbol":"SpeechRecognizer","correct":"speechsdk.SpeechRecognizer"},{"symbol":"ResultReason","correct":"speechsdk.ResultReason"}],"quickstart":{"code":"import os\nimport azure.cognitiveservices.speech as speechsdk\n\n# This example requires environment variables named \"SPEECH_KEY\" and \"SPEECH_REGION\"\n# (or \"ENDPOINT\" for custom endpoints) to be set.\n# Replace with your own subscription key and service region. Example: \"westus\", \"eastus\"\nspeech_key = os.environ.get('SPEECH_KEY', '')\nspeech_region = os.environ.get('SPEECH_REGION', '') # e.g., 'westus'\n\nif not speech_key or not speech_region:\n    print(\"Please set the SPEECH_KEY and SPEECH_REGION environment variables.\")\n    exit()\n\nspeech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)\n\n# The neural multilingual voice can speak different languages based on the input text.\nspeech_config.speech_synthesis_voice_name='en-US-AvaMultilingualNeural'\n\naudio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)\nspeech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)\n\nprint(\"Enter some text that you want to speak (type 'exit' to quit) >\")\nwhile True:\n    text = input()\n    if text.lower() == 'exit':\n        break\n\n    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()\n\n    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:\n        print(f\"Speech synthesized for text: [{text}]\")\n    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:\n        cancellation_details = speech_synthesis_result.cancellation_details\n        print(f\"Speech synthesis canceled: {cancellation_details.reason}\")\n        if cancellation_details.reason == speechsdk.CancellationReason.Error:\n            if cancellation_details.error_details:\n                print(f\"Error details: {cancellation_details.error_details}\")\n            print(\"Did you set the speech resource key and region environment variables correctly?\")","lang":"python","description":"This quickstart demonstrates basic text-to-speech functionality using a neural voice. It initializes the SpeechConfig with an API key and region (retrieved from environment variables), creates a SpeechSynthesizer, and then synthesizes user-provided text to the default speaker. Ensure 'SPEECH_KEY' and 'SPEECH_REGION' environment variables are set before running."},"warnings":[{"fix":"Update `speech_config.speech_synthesis_voice_name` to use a supported neural voice (e.g., `en-US-AvaMultilingualNeural`). Review the latest documentation for available neural voices.","message":"Standard text-to-speech voices were retired on August 31, 2024. Applications using these voices must migrate to neural voices to avoid service disruption.","severity":"breaking","affected_versions":"<= 1.47.0 (prior to August 2024)"},{"fix":"Migrate to alternative services or patterns for intent and speaker recognition as described in Microsoft's documentation and sample repositories.","message":"Support for Intent Recognition and Speaker Recognition has been removed due to service retirement.","severity":"breaking","affected_versions":"All versions (service retirement)"},{"fix":"Ensure outbound connections to `*.cognitiveservices.azure.com` are allowed. If behind a proxy, use `speechsdk.SpeechConfig.set_proxy()` method. Verify that the Speech resource key, region, and endpoint are correct and match your Azure deployment. Implement robust logging for SDK errors.","message":"Network connectivity issues, including firewalls, proxies, and incorrect endpoint configurations, are common. The SDK might silently fail without clear exceptions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Download and install the appropriate Visual C++ Redistributable package for your platform from the Microsoft website. A system restart might be required.","message":"On Windows, the Speech SDK requires the Microsoft Visual C++ Redistributable for Visual Studio 2015-2022 to be installed.","severity":"gotcha","affected_versions":"All versions on Windows"},{"fix":"Implement retry logic for asynchronous operations. Consider breaking down large text inputs into smaller chunks. Optimize SSML structure and monitor performance in your deployment environment.","message":"Latency issues, especially with large SSML files or certain neural voices (e.g., F1 tier), can lead to partial audio output or 'Internal Server Error' due to timeouts.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Double-check `SPEECH_KEY` and `SPEECH_REGION` (or `ENDPOINT`) values. Ensure they are current and correspond to your Azure Speech resource. Avoid hardcoding credentials; use environment variables or a secure key management system.","message":"Authentication failures often stem from incorrect API keys, expired tokens, or mismatches between the specified region/endpoint in code and the actual Azure resource deployment.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}