Amazon Transcribe Streaming SDK for Python

raw JSON →
0.6.4 verified Thu Apr 16 auth: no python deprecated

The `amazon-transcribe` library is an asynchronous Python SDK designed for direct integration with the Amazon Transcribe Streaming service, enabling real-time conversion of audio into text. As of version 0.6.4, this SDK is considered experimental, is no longer actively developed, and is not officially supported by AWS for new projects. It receives infrequent updates and has a stated lack of commitment for ongoing support.

pip install amazon-transcribe
error BadRequestException: Your stream is too big.
cause The audio data stream provided to Amazon Transcribe exceeds the service's size limits or contains malformed audio data.
fix
Break your audio stream into smaller, appropriately sized chunks. Ensure your audio format (PCM, 16-bit) and sample rate (e.g., 16000 Hz) match the parameters specified in start_stream_transcription.
error LimitExceededException
cause Your client has exceeded one of the Amazon Transcribe limits, typically the audio length limit for a streaming session.
fix
Review Amazon Transcribe service quotas. If applicable, break your audio input into smaller, separate streaming sessions, or reduce the frequency of requests if rate limits are hit. The AWS SDKs typically include automatic retry mechanisms for rate limit exceptions.
error InternalFailureException
cause A problem occurred internally while Amazon Transcribe was processing the audio, leading to termination of processing.
fix
This often indicates a transient server-side issue. Implement robust retry logic with exponential backoff. If the problem persists, gather detailed logs and contact AWS Support.
error HTTP/2 stream is abnormally aborted in mid-communication with result code 2
cause This error can stem from various issues including incorrect audio format, mismatched sample rate, an unusually small chunk size, or underlying network connectivity problems between your client and Amazon Transcribe.
fix
Verify that your PCM audio is 16-bit, and the media_sample_rate_hz parameter matches your audio source. Try increasing the audio chunk size (e.g., to 32 KB per chunk). Check your network connection stability. Ensure proper error handling, especially for initial audio chunks.
error UnrecognizedClientException: The security token included in the request is invalid. (or similar credential errors)
cause The AWS credentials (access key, secret key, session token) used by the SDK are invalid, expired, or not configured correctly.
fix
Ensure AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION environment variables are correctly set, or that your AWS CLI configured profile (~/.aws/credentials) is valid and accessible. If using temporary credentials, ensure they haven't expired.
breaking This SDK is experimental, no longer actively developed, and is not recommended for new projects. It is provided as-is without a support commitment and is being replaced by other AWS solutions.
fix For new projects, consider using the official AWS SDKs (e.g., Boto3 for batch transcription or other language-specific streaming SDKs/approaches) as recommended by AWS documentation. For existing projects, proceed with caution and pin strict dependencies.
gotcha The standard AWS SDK for Python (Boto3) does NOT support Amazon Transcribe streaming. This `amazon-transcribe` SDK is specifically designed for streaming.
fix Ensure you are using the `amazon-transcribe` SDK for streaming transcription. Boto3 is suitable for batch transcription jobs.
gotcha The SDK can, in rare cases, suffer from high CPU issues.
fix Monitor CPU usage and refer to GitHub issues (e.g., #109, #84) for potential workarounds if encountered.
gotcha The `awscrt` dependency, built on C libraries, may require manual compilation on non-standard operating systems if precompiled wheels are not available.
fix Ensure your operating system is supported by `awscrt` precompiled wheels, or be prepared to compile the AWS Common Runtime libraries manually.
gotcha Amazon Transcribe only supports one audio stream per WebSocket session. Attempting to use multiple streams simultaneously within a single session will cause the transcription request to fail.
fix Design your application to handle a single audio stream per `start_stream_transcription` call and associated `output_stream` handler.

This quickstart demonstrates how to establish an asynchronous connection to Amazon Transcribe Streaming, send a simulated audio stream, and process the real-time transcription results. It defines a custom event handler to print the final transcription segments. Remember to set your AWS credentials and region via environment variables or AWS CLI configuration for actual use.

import asyncio
import os
import time

# NOTE: aiofile is not a direct dependency but is commonly used
# for asynchronous file reads in examples. Install with `pip install aiofile`.
# For a minimal example, we'll simulate an audio stream.
# import aiofile

from amazon_transcribe.client import TranscribeStreamingClient
from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent

# Configure AWS credentials from environment variables for quickstart.
# In a real application, consider using AWS CLI config or IAM roles.
# os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', '')
# os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', '')
# os.environ['AWS_SESSION_TOKEN'] = os.environ.get('AWS_SESSION_TOKEN', '') # Optional

class MyEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        results = transcript_event.transcript.results
        for result in results:
            if not result.is_partial:
                for alternative in result.alternatives:
                    print(f"[Transcription]: {alternative.transcript}")

async def basic_transcribe_stream(client: TranscribeStreamingClient, region: str):
    # Simulate a stream of audio bytes (replace with actual audio source)
    async def get_audio_stream():
        # For a real application, read from microphone or file (e.g., using aiofile)
        # For this example, we'll send a few empty bytes to keep the stream open briefly
        for _ in range(5):
            yield b'\x00' * 1024 # Simulate 1KB of silence/empty data
            await asyncio.sleep(0.1)
        print("\n--- Audio stream ended ---")

    stream = await client.start_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=16000,
        media_encoding="pcm",
    )

    # Instantiate our handler and start processing events
    handler = MyEventHandler(stream.output_stream)
    await asyncio.gather(stream.input_stream.send_from_iterable(get_audio_stream()), handler.handle_events())

async def main():
    # Ensure AWS_REGION is set, e.g., in your environment variables
    aws_region = os.environ.get('AWS_REGION', 'us-east-1')
    print(f"Connecting to Amazon Transcribe in region: {aws_region}")
    client = TranscribeStreamingClient(region=aws_region)
    await basic_transcribe_stream(client, aws_region)

if __name__ == "__main__":
    # Set dummy credentials if not set for local testing, replace with actual for production
    if not os.environ.get('AWS_ACCESS_KEY_ID'):
        os.environ['AWS_ACCESS_KEY_ID'] = 'AKIAIOSFODNN7EXAMPLE'
        os.environ['AWS_SECRET_ACCESS_KEY'] = 'wJalrXUtnFEMI/K7MDENG/bPxRfiorexamplekey'
        os.environ['AWS_REGION'] = 'us-east-1'
        print("Warning: Using dummy AWS credentials and region. Set environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION for actual use.")

    asyncio.run(main())