ChromaDB

raw JSON →
1.5.5 verified Tue May 12 auth: no python install: verified quickstart: stale

Open-source embedded vector database for AI applications. Runs in-process (EphemeralClient, PersistentClient) or client-server mode (HttpClient). Handles embedding storage, metadata filtering, and similarity search. Supports pluggable embedding functions. Core backend rewritten in Rust in 1.x; also ships a lightweight HTTP-only client as the separate chromadb-client package.

pip install chromadb
error ModuleNotFoundError: No module named 'chromadb'
cause The `chromadb` package is not installed in the active Python environment or there's a virtual environment misconfiguration, especially when used with other libraries like LangChain.
fix
Ensure chromadb is installed in your current environment: pip install chromadb or poetry add chromadb (if using Poetry). If using a virtual environment, activate it before installing.
error ValueError: You must provide an embedding function to compute embeddings.
cause When using `chromadb` without a default embedding function (e.g., with `chromadb-client` or in a fresh installation without optional dependencies for a default embedder), you must explicitly specify an embedding function when adding documents to a collection.
fix
Provide an embedding function (e.g., from chromadb.utils.embedding_functions or a custom one) when creating the collection or adding documents.
import chromadb
from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key='YOUR_API_KEY', # Replace with your actual OpenAI API key
    model_name='text-embedding-ada-002'
)

client = chromadb.Client()
collection = client.create_collection(
    name='my_collection',
    embedding_function=openai_ef # Provide the embedding function here
)
collection.add(documents=['Hello world'], ids=['doc1'])
error RuntimeError: Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' as the chroma_api_impl.
cause This error typically occurs when there's a conflict between installed `chromadb` packages (e.g., `chromadb` and `chromadb-client`) or when attempting to use a local client (like `PersistentClient` or `EphemeralClient`) in an environment configured to only allow the HTTP-only client, often due to an existing server instance or environment variables.
fix
If you intend to run a local client, ensure only the chromadb package is installed and there are no conflicting CHROMA_API_IMPL environment variables. If you intend to connect to a remote Chroma server, use chromadb.HttpClient and ensure the server is running. A common solution is to reinstall chromadb in a clean environment.
error AttributeError: 'Chroma' object has no attribute 'persist'
cause In newer versions of `chromadb` (specifically since 0.4.x) and its LangChain integration, the `.persist()` method is no longer needed or available because data is automatically persisted to disk when `persist_directory` is specified during client initialization.
fix
Remove the .persist() call from your code. If you initialize Chroma with a persist_directory, data will be saved automatically.
# Old code (with .persist())
# vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
# vectordb.persist() # Remove this line

# Corrected code
vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
# Data is automatically persisted if persist_directory is provided
error ValueError: Expected metadata to be a string, number, boolean, SparseVector, or nullable.
cause ChromaDB has strict requirements for metadata, only supporting primitive types (strings, numbers, booleans, or `SparseVector`) and `null`. This error occurs when you try to add documents with metadata containing nested objects or arrays.
fix
Flatten your metadata dictionary to ensure all values are primitive types. Remove any nested dictionaries or lists from the metadata before adding documents to the Chroma collection.
# Example of incorrect metadata
# metadata = {'source': 'doc1', 'details': {'author': 'John Doe', 'date': '2023-01-01'}}

# Corrected metadata
metadata = {'source': 'doc1', 'author': 'John Doe', 'date': '2023-01-01'}
collection.add(documents=['My document content'], metadatas=[metadata], ids=['id1'])
breaking chromadb.Client(Settings(...)) removed in 0.4.0. Enormous volume of tutorials, LangChain/LlamaIndex integration examples, and LLM-generated code still uses it. Raises AttributeError or TypeError on import.
fix Replace with chromadb.EphemeralClient() (in-memory), chromadb.PersistentClient(path=...) (disk), or chromadb.HttpClient(host=..., port=...) (server). Old chroma_db_impl="duckdb+parquet" setting is gone entirely.
breaking Database migrations between Chroma versions are irreversible. Upgrading the chromadb package upgrades on-disk data format. Downgrading after upgrade causes data loss or corruption.
fix Back up PersistentClient data directory before upgrading. Use chroma utils migrate CLI if available for the version transition. Pin version in production: pip install chromadb==X.Y.Z.
breaking Server CORS and auth configuration moved from environment variables to a YAML config file in the 1.x Rust-backed server. Environment variables like CHROMA_SERVER_CORS_ALLOW_ORIGINS and CHROMA_SERVER_AUTH_CREDENTIALS no longer work.
fix Migrate server configuration to a chroma.yaml config file. See docs.trychroma.com/docs/overview/migration for the full config file schema.
gotcha Default embedding function downloads ~200MB of model weights (all-MiniLM-L6-v2 via onnxruntime) on first call. First add() or query() call in a new environment hangs while downloading. No progress indicator.
fix Pass embedding_function=None and provide embeddings= directly, or pre-download by calling the embedding function once explicitly before serving traffic. Use chromadb-client package if you never need local embedding.
gotcha PersistentClient does not support concurrent access from multiple processes. SQLite-backed storage uses file locking. Multiple processes writing to the same path cause database corruption or blocked writes.
fix For multi-process workloads, run chroma run --path ... as a server and connect all clients via HttpClient.
gotcha collection.query() where= filter uses a specific operator syntax ($eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or). Plain dict equality {"key": "value"} is not valid — must be {"key": {"": "value"}}. Raises ValueError silently in old versions, error in new.
fix Use explicit operator syntax for all metadata filters: where={"source": {"": "arxiv"}} not where={"source": "arxiv"}.
gotcha Telemetry is enabled by default (sends anonymized usage data to PostHog). Runs on every client init.
fix Disable with: chromadb.EphemeralClient(settings=Settings(anonymized_telemetry=False)) or set environment variable ANONYMIZED_TELEMETRY=False.
breaking Installation fails on Alpine Linux (musl-based distributions) due to missing C/C++ build tools and runtime libraries required by the Rust backend. Specifically, `libgcc_s.so.1` is not found and a `cc` linker is missing, leading to `subprocess.CalledProcessError` during package metadata preparation.
fix On Alpine Linux, install the necessary build tools and C++ standard library: `apk add build-base libstdc++` before attempting to install chromadb.
pip install chromadb-client
python os / libc variant status wheel install import disk
3.10 alpine (musl) chromadb build_error - - - -
3.10 alpine (musl) chromadb-client wheel - 2.49s 231.5M
3.10 alpine (musl) chromadb build_error - - - -
3.10 alpine (musl) chromadb - - - -
3.10 alpine (musl) chromadb-client wheel - 2.57s 231.5M
3.10 alpine (musl) chromadb-client - - 4.33s 230.9M
3.10 slim (glibc) chromadb wheel 36.3s 1.79s 410M
3.10 slim (glibc) chromadb-client wheel 15.9s 1.58s 292M
3.10 slim (glibc) chromadb wheel 35.7s 1.67s 410M
3.10 slim (glibc) chromadb - - 3.10s 406M
3.10 slim (glibc) chromadb-client wheel 16.0s 1.63s 292M
3.10 slim (glibc) chromadb-client - - 3.04s 292M
3.11 alpine (musl) chromadb build_error - - - -
3.11 alpine (musl) chromadb-client wheel - 3.82s 249.0M
3.11 alpine (musl) chromadb build_error - - - -
3.11 alpine (musl) chromadb - - - -
3.11 alpine (musl) chromadb-client wheel - 3.79s 249.0M
3.11 alpine (musl) chromadb-client - - 5.95s 248.3M
3.11 slim (glibc) chromadb wheel 28.0s 2.73s 372M
3.11 slim (glibc) chromadb-client wheel 13.8s 2.56s 309M
3.11 slim (glibc) chromadb wheel 31.3s 2.63s 372M
3.11 slim (glibc) chromadb - - 4.64s 372M
3.11 slim (glibc) chromadb-client wheel 14.1s 2.65s 309M
3.11 slim (glibc) chromadb-client - - 4.51s 309M
3.12 alpine (musl) chromadb build_error - - - -
3.12 alpine (musl) chromadb-client wheel - 3.54s 235.5M
3.12 alpine (musl) chromadb build_error - - - -
3.12 alpine (musl) chromadb - - - -
3.12 alpine (musl) chromadb-client wheel - 3.51s 235.5M
3.12 alpine (musl) chromadb-client - - 6.01s 234.7M
3.12 slim (glibc) chromadb wheel 31.3s 3.45s 361M
3.12 slim (glibc) chromadb-client wheel 11.8s 3.03s 296M
3.12 slim (glibc) chromadb wheel 26.9s 3.16s 361M
3.12 slim (glibc) chromadb - - 5.67s 361M
3.12 slim (glibc) chromadb-client wheel 12.9s 3.07s 296M
3.12 slim (glibc) chromadb-client - - 4.98s 295M
3.13 alpine (musl) chromadb build_error - - - -
3.13 alpine (musl) chromadb-client wheel - 3.34s 231.8M
3.13 alpine (musl) chromadb build_error - - - -
3.13 alpine (musl) chromadb - - - -
3.13 alpine (musl) chromadb-client wheel - 3.36s 231.8M
3.13 alpine (musl) chromadb-client - - 4.77s 231.1M
3.13 slim (glibc) chromadb wheel 26.8s 3.21s 359M
3.13 slim (glibc) chromadb-client wheel 13.2s 3.03s 294M
3.13 slim (glibc) chromadb wheel 28.9s 3.16s 359M
3.13 slim (glibc) chromadb - - 5.25s 360M
3.13 slim (glibc) chromadb-client wheel 12.5s 3.01s 294M
3.13 slim (glibc) chromadb-client - - 5.12s 293M
3.9 alpine (musl) chromadb build_error - - - -
3.9 alpine (musl) chromadb-client wheel - 2.33s 239.9M
3.9 alpine (musl) chromadb build_error - - - -
3.9 alpine (musl) chromadb - - - -
3.9 alpine (musl) chromadb-client wheel - 2.34s 239.9M
3.9 alpine (musl) chromadb-client - - 4.04s 239.2M
3.9 slim (glibc) chromadb wheel 42.3s 2.11s 407M
3.9 slim (glibc) chromadb-client wheel 19.4s 1.98s 305M
3.9 slim (glibc) chromadb wheel 41.4s 2.16s 407M
3.9 slim (glibc) chromadb - - 2.94s 404M
3.9 slim (glibc) chromadb-client wheel 19.7s 1.90s 305M
3.9 slim (glibc) chromadb-client - - 2.79s 304M

get_or_create_collection() is idempotent and preferred over create_collection() for most use cases. Python 3.9+ required — chromadb's telemetry dependency (posthog) fails silently on 3.8 with a misleading TypeError.

import sys
if sys.version_info < (3, 9):
    raise RuntimeError("chromadb requires Python 3.9+. Current: " +
  sys.version)

import chromadb

# In-memory (prototyping)
client = chromadb.EphemeralClient()

# Persistent (local dev)
# client =
  chromadb.PersistentClient(path="/path/to/db")

collection = client.get_or_create_collection("my_docs")

collection.add(

  documents=["This is doc one", "This is doc two"],
    ids=["id1", "id2"],
)

results = collection.query(
    query_texts=["find
  something"],
    n_results=2,
)
print(results)