Cerebras Cloud SDK

raw JSON →
1.67.0 verified Tue May 12 auth: no python install: verified quickstart: verified

Official Python SDK for the Cerebras Cloud inference API. Provides access to ultra-fast LLM inference on Cerebras Wafer-Scale Engine hardware. OpenAI-compatible API surface. Generated with Stainless. Current version: 1.67.0 (Mar 2026). Requires Python 3.9+. Note: separate from cerebras-sdk (PyPI) which is a hardware kernel development tool — completely different product.

pip install cerebras-cloud-sdk
error ModuleNotFoundError: No module named 'cerebras.cloud.sdk'
cause The `cerebras-cloud-sdk` Python package is not installed in the current environment.
fix
Install the SDK using pip: pip install cerebras-cloud-sdk
error cerebras.cloud.sdk.AuthenticationError: Invalid API key
cause The Cerebras API client was initialized without a valid API key, or the provided key is incorrect or expired.
fix
Ensure your CEREBRAS_API_KEY environment variable is set correctly, or pass api_key='YOUR_API_KEY' to the Cerebras client constructor.
error cerebras.cloud.sdk.APIConnectionError
cause The Cerebras API client failed to establish a connection to the API endpoint, possibly due to network issues, a timeout, or an incorrect base URL.
fix
Verify your network connectivity, check the configured base_url for the client, and ensure no firewalls are blocking the connection.
error cerebras.cloud.sdk.NotFoundError: 404 Not Found
cause The API request targeted a resource that does not exist, such as an incorrect model name or an invalid endpoint path.
fix
Review the API documentation to ensure the model name is correct and all request parameters and paths are valid. Inspect the error's status_code and response properties for more details.
error AttributeError: 'ChatCompletionCreateParams' object has no attribute 'max_tokens'
cause Developers often confuse Cerebras SDK's chat completion parameters with OpenAI's due to its OpenAI-compatible API surface, attempting to use parameters like `max_tokens` (OpenAI's) instead of the Cerebras SDK's specific parameter (e.g., `max_new_tokens`).
fix
Consult the cerebras-cloud-sdk documentation for the correct parameter names (e.g., max_new_tokens) or pass non-standard parameters within the extra_body argument if applicable.
breaking cerebras-sdk on PyPI is a completely different package — it is Cerebras's hardware kernel development SDK for WSE systems. Do not confuse with cerebras-cloud-sdk for cloud inference API.
fix pip install cerebras-cloud-sdk for cloud inference. cerebras-sdk is for hardware kernel development.
gotcha SDK sends TCP warming requests to /v1/tcp_warming on client construction to reduce time-to-first-token. Creates network traffic at import time. Disable with warm_tcp_connection=False if reconstructing client frequently.
fix client = Cerebras(api_key=..., warm_tcp_connection=False) — and reuse a single client instance rather than reconstructing.
gotcha Reconstructing the Cerebras client instance repeatedly causes poor performance due to repeated TCP warming. Construct once and reuse.
fix Create a module-level singleton client. Do not instantiate Cerebras() inside request handlers or loops.
gotcha Requires Python 3.9+. Will fail to install on Python 3.8 with no clear error message.
fix Use Python 3.9 or higher.
gotcha LLMs with no training data on Cerebras will hallucinate OpenAI-style base_url override pattern. Cerebras has its own SDK — do not use openai with base_url for Cerebras.
fix Use cerebras-cloud-sdk directly, not openai with base_url='https://api.cerebras.ai'.
breaking The Cerebras client requires an API key, which must be passed as an argument (api_key=...) or set via the CEREBRAS_API_KEY environment variable. Without it, the client cannot be initialized.
fix Initialize the Cerebras client with client = Cerebras(api_key='YOUR_API_KEY') or set the CEREBRAS_API_KEY environment variable before running the application.
breaking The Cerebras client requires an API key for authentication. This can be provided by setting the CEREBRAS_API_KEY environment variable or by passing the 'api_key' argument directly to the Cerebras client constructor.
fix Set the CEREBRAS_API_KEY environment variable (e.g., `export CEREBRAS_API_KEY='your_api_key'`) or pass `api_key='your_api_key'` when instantiating the Cerebras client (e.g., `client = Cerebras(api_key='your_api_key')`).
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.70s 34.1M
3.10 slim (glibc) - - 0.54s 34M
3.11 alpine (musl) - - 1.00s 37.1M
3.11 slim (glibc) - - 0.83s 37M
3.12 alpine (musl) - - 1.13s 28.6M
3.12 slim (glibc) - - 1.11s 28M
3.13 alpine (musl) - - 1.02s 28.2M
3.13 slim (glibc) - - 1.01s 28M
3.9 alpine (musl) - - 0.65s 33.1M
3.9 slim (glibc) - - 0.59s 33M

Minimal Cerebras inference call using cerebras-cloud-sdk 1.x.

# pip install cerebras-cloud-sdk
from cerebras.cloud.sdk import Cerebras
import os

client = Cerebras(
    api_key=os.environ.get('CEREBRAS_API_KEY')
)

response = client.chat.completions.create(
    model='llama3.1-8b',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'What is fast inference?'}
    ]
)
print(response.choices[0].message.content)