Ollama Python Library
Official Python client for Ollama — the local LLM runtime. Wraps the Ollama REST API with native Python types. Requires Ollama to be installed and running locally (or pointed at https://ollama.com for cloud models). Not an inference provider — it's a runtime client.
Warnings
- breaking pip install ollama installs only the Python client — not the Ollama runtime. If the runtime is not installed and running, all calls raise ConnectionError immediately.
- breaking Models must be explicitly pulled before use. Calling chat() with a model that hasn't been pulled raises a ResponseError with status_code 404.
- breaking Tool calling with stream=True is incomplete. When tools are provided, Ollama buffers the full response and returns it as a single block even with stream=True set — true incremental streaming does not occur.
- gotcha pip install ollama-python installs a different, unofficial third-party package with a completely different API (ModelManagementAPI, GenerateAPI classes). It is not the official client.
- gotcha The Ollama Python library defaults to http://localhost:11434. To connect to a remote Ollama instance, you must pass host explicitly via Client(host='http://<remote>:11434'). The OLLAMA_HOST env var is respected by the runtime but NOT automatically picked up by the Python client.
- gotcha The OpenAI-compatible endpoint is at http://localhost:11434/v1 — but this is a separate REST interface, not the native ollama Python library. Using openai SDK against this endpoint requires api_key='ollama' (any non-empty string) as a placeholder.
- gotcha Cloud models (e.g. gpt-oss:120b-cloud, deepseek-v3.1:671b-cloud) require ollama signin and ollama pull before use. They are routed through ollama.com and require an API key when accessed via the cloud API endpoint directly.
Install
-
pip install ollama
Imports
- chat
from ollama import chat response = chat(model='gemma3', messages=[{'role': 'user', 'content': 'Hello'}]) print(response.message.content) - Client (remote/custom host)
from ollama import Client client = Client(host='http://localhost:11434') response = client.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Hello'}])
Quickstart
from ollama import chat
response = chat(
model='gemma3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}]
)
print(response.message.content)