Cerebras Cloud SDK

1.67.0 verified Tue May 12 auth: no python install: verified quickstart: verified

Official Python SDK for the Cerebras Cloud inference API. Provides access to ultra-fast LLM inference on Cerebras Wafer-Scale Engine hardware. OpenAI-compatible API surface. Generated with Stainless. Current version: 1.67.0 (Mar 2026). Requires Python 3.9+. Note: separate from cerebras-sdk (PyPI) which is a hardware kernel development tool — completely different product.

pip install cerebras-cloud-sdk

Common errors

error ModuleNotFoundError: No module named 'cerebras.cloud.sdk' ↓

cause The `cerebras-cloud-sdk` Python package is not installed in the current environment.

fix

Install the SDK using pip: pip install cerebras-cloud-sdk

error cerebras.cloud.sdk.AuthenticationError: Invalid API key ↓

cause The Cerebras API client was initialized without a valid API key, or the provided key is incorrect or expired.

fix

Ensure your CEREBRAS_API_KEY environment variable is set correctly, or pass api_key='YOUR_API_KEY' to the Cerebras client constructor.

error cerebras.cloud.sdk.APIConnectionError ↓

cause The Cerebras API client failed to establish a connection to the API endpoint, possibly due to network issues, a timeout, or an incorrect base URL.

fix

Verify your network connectivity, check the configured base_url for the client, and ensure no firewalls are blocking the connection.

error cerebras.cloud.sdk.NotFoundError: 404 Not Found ↓

cause The API request targeted a resource that does not exist, such as an incorrect model name or an invalid endpoint path.

fix

Review the API documentation to ensure the model name is correct and all request parameters and paths are valid. Inspect the error's status_code and response properties for more details.

error AttributeError: 'ChatCompletionCreateParams' object has no attribute 'max_tokens' ↓

cause Developers often confuse Cerebras SDK's chat completion parameters with OpenAI's due to its OpenAI-compatible API surface, attempting to use parameters like `max_tokens` (OpenAI's) instead of the Cerebras SDK's specific parameter (e.g., `max_new_tokens`).

fix

Consult the cerebras-cloud-sdk documentation for the correct parameter names (e.g., max_new_tokens) or pass non-standard parameters within the extra_body argument if applicable.

Warnings

breaking cerebras-sdk on PyPI is a completely different package — it is Cerebras's hardware kernel development SDK for WSE systems. Do not confuse with cerebras-cloud-sdk for cloud inference API. ↓

fix pip install cerebras-cloud-sdk for cloud inference. cerebras-sdk is for hardware kernel development.

gotcha SDK sends TCP warming requests to /v1/tcp_warming on client construction to reduce time-to-first-token. Creates network traffic at import time. Disable with warm_tcp_connection=False if reconstructing client frequently. ↓

fix client = Cerebras(api_key=..., warm_tcp_connection=False) — and reuse a single client instance rather than reconstructing.

gotcha Reconstructing the Cerebras client instance repeatedly causes poor performance due to repeated TCP warming. Construct once and reuse. ↓

fix Create a module-level singleton client. Do not instantiate Cerebras() inside request handlers or loops.

gotcha Requires Python 3.9+. Will fail to install on Python 3.8 with no clear error message. ↓

fix Use Python 3.9 or higher.

gotcha LLMs with no training data on Cerebras will hallucinate OpenAI-style base_url override pattern. Cerebras has its own SDK — do not use openai with base_url for Cerebras. ↓

fix Use cerebras-cloud-sdk directly, not openai with base_url='https://api.cerebras.ai'.

breaking The Cerebras client requires an API key, which must be passed as an argument (api_key=...) or set via the CEREBRAS_API_KEY environment variable. Without it, the client cannot be initialized. ↓

fix Initialize the Cerebras client with client = Cerebras(api_key='YOUR_API_KEY') or set the CEREBRAS_API_KEY environment variable before running the application.

breaking The Cerebras client requires an API key for authentication. This can be provided by setting the CEREBRAS_API_KEY environment variable or by passing the 'api_key' argument directly to the Cerebras client constructor. ↓

fix Set the CEREBRAS_API_KEY environment variable (e.g., `export CEREBRAS_API_KEY='your_api_key'`) or pass `api_key='your_api_key'` when instantiating the Cerebras client (e.g., `client = Cerebras(api_key='your_api_key')`).

Install compatibility verified last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) - - 0.70s 34.1M

3.10 slim (glibc) - - 0.54s 34M

3.11 alpine (musl) - - 1.00s 37.1M

3.11 slim (glibc) - - 0.83s 37M

3.12 alpine (musl) - - 1.13s 28.6M

3.12 slim (glibc) - - 1.11s 28M

3.13 alpine (musl) - - 1.02s 28.2M

3.13 slim (glibc) - - 1.01s 28M

3.9 alpine (musl) - - 0.65s 33.1M

3.9 slim (glibc) - - 0.59s 33M

Imports

Cerebras

wrong

import cerebras
client = cerebras.Client(api_key='...')

correct

from cerebras.cloud.sdk import Cerebras
import os

client = Cerebras(
    api_key=os.environ.get('CEREBRAS_API_KEY')
)

response = client.chat.completions.create(
    model='llama3.1-8b',
    messages=[{'role': 'user', 'content': 'Why is fast inference important?'}]
)
print(response.choices[0].message.content)

Import path is 'from cerebras.cloud.sdk import Cerebras' — not 'import cerebras'. LLMs hallucinate a top-level cerebras module that does not exist.

AsyncCerebras

wrong

from cerebras.cloud.sdk import Cerebras
# using sync client in async context

correct

from cerebras.cloud.sdk import AsyncCerebras
import asyncio, os

client = AsyncCerebras(
    api_key=os.environ.get('CEREBRAS_API_KEY')
)

async def main():
    response = await client.chat.completions.create(
        model='llama3.1-8b',
        messages=[{'role': 'user', 'content': 'Hello'}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Use AsyncCerebras for async code — same module, different class name.

Quickstart verified last tested: 2026-04-23

Minimal Cerebras inference call using cerebras-cloud-sdk 1.x.

# pip install cerebras-cloud-sdk
from cerebras.cloud.sdk import Cerebras
import os

client = Cerebras(
    api_key=os.environ.get('CEREBRAS_API_KEY')
)

response = client.chat.completions.create(
    model='llama3.1-8b',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'What is fast inference?'}
    ]
)
print(response.choices[0].message.content)