Deepgram Python SDK
The official Python SDK for Deepgram's automated speech recognition, text-to-speech, and language understanding APIs. It enables developers to integrate world-class speech and Language AI models into their applications. The library is actively maintained with frequent releases, currently at version 6.1.1.
Warnings
- breaking Version 6.0.0 introduced significant breaking changes, including a complete overhaul of WebSocket clients. Hand-rolled WebSocket code from v5 has been replaced by fully generated clients for Listen v1/v2, Speak v1, and Agent v1.
- breaking The `send_media()` method for WebSocket clients now exclusively accepts raw `bytes` for audio data. Control messages (like keep-alive, finalize, flush) have been replaced by dedicated methods (`send_keep_alive()`, `send_finalize()`, `send_flush()`) instead of the generic `send_control({'type': '...'})` pattern.
- breaking The type system in v6 has shifted to domain-specific imports. Types are now imported from their respective feature namespaces (e.g., `deepgram.listen.v1.types`, `deepgram.agent.v1.types`) instead of a shared 'barrel' module.
- breaking The SageMaker transport functionality has been extracted into a separate package, `deepgram-sagemaker`. It is no longer part of the core `deepgram-sdk`.
- gotcha When providing authentication credentials, the SDK prioritizes environment variables. Specifically, `DEEPGRAM_TOKEN` takes precedence over `DEEPGRAM_API_KEY`. Explicit `access_token` or `api_key` parameters during `DeepgramClient` initialization take the highest precedence.
Install
-
pip install deepgram-sdk
Imports
- DeepgramClient
from deepgram import DeepgramClient
- EventType
from deepgram.core.events import EventType
- ListenV2Options
from deepgram.listen.v2.options import ListenV2Options
Quickstart
import os
from deepgram import DeepgramClient, DeepgramClientOptions, LiveTranscriptionEvents
# Ensure you have your Deepgram API key set as an environment variable (DEEPGRAM_API_KEY or DEEPGRAM_TOKEN)
API_KEY = os.environ.get('DEEPGRAM_API_KEY') or os.environ.get('DEEPGRAM_TOKEN')
if not API_KEY:
raise ValueError("DEEPGRAM_API_KEY or DEEPGRAM_TOKEN environment variable not set.")
# Configure the client options for best performance and compatibility
config = DeepgramClientOptions(verbose=1)
deepgram = DeepgramClient(API_KEY, config=config)
# For real-time streaming, connect to the Listen API
# This example demonstrates a synchronous connection for simplicity,
# but async methods are also available.
def main():
try:
# Connect to the real-time Listen API (v2)
connection = deepgram.listen.v2.live.connect()
# Define event handlers
def on_message(self, result, **kwargs):
if result.speech_final: # Only print final transcripts
print(f"Speaker: {result.speaker}") # Assuming speaker diarization is enabled
print(f"Transcript: {result.channel.alternatives[0].transcript}")
def on_open(self, open, **kwargs):
print("Connection opened.")
def on_close(self, close, **kwargs):
print("Connection closed.")
def on_error(self, error, **kwargs):
print(f"Error: {error}")
# Register event handlers
connection.on(LiveTranscriptionEvents.Open, on_open)
connection.on(LiveTranscriptionEvents.Transcript, on_message)
connection.on(LiveTranscriptionEvents.Close, on_close)
connection.on(LiveTranscriptionEvents.Error, on_error)
# Start sending audio data (in a real app, this would be from a microphone or audio file)
# For this example, we'll just send a dummy message and close.
print("Sending dummy data... (in a real app, send actual audio bytes)")
# In a real application, you would continuously send bytes from an audio source:
# connection.send_data(audio_chunk_bytes)
# For now, simulate sending options
connection.send_options({
"model": "nova-2",
"language": "en-US",
"punctuate": True,
"diarize": True,
"smart_format": True
})
import time
time.sleep(5) # Keep connection open for a bit to simulate processing
# Don't forget to close the connection when done
connection.finish()
except Exception as e:
print(f"Could not open connection: {e}")
if __name__ == "__main__":
main()