Cerebras Cloud SDK
Official Python SDK for the Cerebras Cloud inference API. Provides access to ultra-fast LLM inference on Cerebras Wafer-Scale Engine hardware. OpenAI-compatible API surface. Generated with Stainless. Current version: 1.67.0 (Mar 2026). Requires Python 3.9+. Note: separate from cerebras-sdk (PyPI) which is a hardware kernel development tool — completely different product.
Warnings
- breaking cerebras-sdk on PyPI is a completely different package — it is Cerebras's hardware kernel development SDK for WSE systems. Do not confuse with cerebras-cloud-sdk for cloud inference API.
- gotcha SDK sends TCP warming requests to /v1/tcp_warming on client construction to reduce time-to-first-token. Creates network traffic at import time. Disable with warm_tcp_connection=False if reconstructing client frequently.
- gotcha Reconstructing the Cerebras client instance repeatedly causes poor performance due to repeated TCP warming. Construct once and reuse.
- gotcha Requires Python 3.9+. Will fail to install on Python 3.8 with no clear error message.
- gotcha LLMs with no training data on Cerebras will hallucinate OpenAI-style base_url override pattern. Cerebras has its own SDK — do not use openai with base_url for Cerebras.
Install
-
pip install cerebras-cloud-sdk
Imports
- Cerebras
from cerebras.cloud.sdk import Cerebras import os client = Cerebras( api_key=os.environ.get('CEREBRAS_API_KEY') ) response = client.chat.completions.create( model='llama3.1-8b', messages=[{'role': 'user', 'content': 'Why is fast inference important?'}] ) print(response.choices[0].message.content) - AsyncCerebras
from cerebras.cloud.sdk import AsyncCerebras import asyncio, os client = AsyncCerebras( api_key=os.environ.get('CEREBRAS_API_KEY') ) async def main(): response = await client.chat.completions.create( model='llama3.1-8b', messages=[{'role': 'user', 'content': 'Hello'}] ) print(response.choices[0].message.content) asyncio.run(main())
Quickstart
# pip install cerebras-cloud-sdk
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(
api_key=os.environ.get('CEREBRAS_API_KEY')
)
response = client.chat.completions.create(
model='llama3.1-8b',
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'What is fast inference?'}
]
)
print(response.choices[0].message.content)