Groq Python SDK
Official Python SDK for GroqCloud API. OpenAI-compatible interface for ultra-low-latency LLM inference on Groq LPU hardware. Model IDs change frequently as models are deprecated and replaced with no versioned aliases.
Warnings
- breaking groq.cloud.core (pre-release API) is fully removed. ChatCompletion class no longer exists. Any code from early 2024 tutorials is broken.
- breaking Models are deprecated and removed with no versioned aliases. gemma-7b-it and mixtral-8x7b-32768 removed. llama-guard-3-8b decommissioned. Hardcoded model IDs break silently.
- breaking max_tokens is deprecated in favor of max_completion_tokens. Still works but may be removed.
- breaking functions and function_call parameters are deprecated in favor of tools and tool_choice respectively.
- breaking exclude_domains and include_domains parameters deprecated for agentic tooling. Use search_settings parameter instead.
- gotcha n parameter (number of completions) only supports n=1. Passing any other value returns a 400 error.
- gotcha logprobs, presence_penalty, and frequency_penalty are listed in the API but not supported by any current models. Passing them does not error but has no effect.
- gotcha Preview models can be discontinued at short notice. Do not use in production.
- gotcha Rate limits are per-model and vary significantly. Free tier limits are very low. 429s happen frequently in dev without a paid plan.
Install
-
pip install groq -
uv add groq
Imports
- Groq
from groq import Groq
- AsyncGroq
from groq import AsyncGroq
- aiohttp backend
from groq import DefaultAioHttpClient
Quickstart
import os
from groq import Groq
client = Groq(api_key=os.environ['GROQ_API_KEY'])
response = client.chat.completions.create(
model='llama-3.3-70b-versatile',
messages=[{'role': 'user', 'content': 'Hello'}]
)
print(response.choices[0].message.content)