ChromaDB
Open-source embedded vector database for AI applications. Runs in-process (EphemeralClient, PersistentClient) or client-server mode (HttpClient). Handles embedding storage, metadata filtering, and similarity search. Supports pluggable embedding functions. Core backend rewritten in Rust in 1.x; also ships a lightweight HTTP-only client as the separate chromadb-client package.
Common errors
-
ModuleNotFoundError: No module named 'chromadb'
cause The `chromadb` package is not installed in the active Python environment or there's a virtual environment misconfiguration, especially when used with other libraries like LangChain.fixEnsure `chromadb` is installed in your current environment: `pip install chromadb` or `poetry add chromadb` (if using Poetry). If using a virtual environment, activate it before installing. -
ValueError: You must provide an embedding function to compute embeddings.
cause When using `chromadb` without a default embedding function (e.g., with `chromadb-client` or in a fresh installation without optional dependencies for a default embedder), you must explicitly specify an embedding function when adding documents to a collection.fixProvide an embedding function (e.g., from `chromadb.utils.embedding_functions` or a custom one) when creating the collection or adding documents. ```python import chromadb from chromadb.utils import embedding_functions openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key='YOUR_API_KEY', # Replace with your actual OpenAI API key model_name='text-embedding-ada-002' ) client = chromadb.Client() collection = client.create_collection( name='my_collection', embedding_function=openai_ef # Provide the embedding function here ) collection.add(documents=['Hello world'], ids=['doc1']) ``` -
RuntimeError: Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' as the chroma_api_impl.
cause This error typically occurs when there's a conflict between installed `chromadb` packages (e.g., `chromadb` and `chromadb-client`) or when attempting to use a local client (like `PersistentClient` or `EphemeralClient`) in an environment configured to only allow the HTTP-only client, often due to an existing server instance or environment variables.fixIf you intend to run a local client, ensure only the `chromadb` package is installed and there are no conflicting `CHROMA_API_IMPL` environment variables. If you intend to connect to a remote Chroma server, use `chromadb.HttpClient` and ensure the server is running. A common solution is to reinstall `chromadb` in a clean environment. -
AttributeError: 'Chroma' object has no attribute 'persist'
cause In newer versions of `chromadb` (specifically since 0.4.x) and its LangChain integration, the `.persist()` method is no longer needed or available because data is automatically persisted to disk when `persist_directory` is specified during client initialization.fixRemove the `.persist()` call from your code. If you initialize `Chroma` with a `persist_directory`, data will be saved automatically. ```python # Old code (with .persist()) # vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory) # vectordb.persist() # Remove this line # Corrected code vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory) # Data is automatically persisted if persist_directory is provided ``` -
ValueError: Expected metadata to be a string, number, boolean, SparseVector, or nullable.
cause ChromaDB has strict requirements for metadata, only supporting primitive types (strings, numbers, booleans, or `SparseVector`) and `null`. This error occurs when you try to add documents with metadata containing nested objects or arrays.fixFlatten your metadata dictionary to ensure all values are primitive types. Remove any nested dictionaries or lists from the metadata before adding documents to the Chroma collection. ```python # Example of incorrect metadata # metadata = {'source': 'doc1', 'details': {'author': 'John Doe', 'date': '2023-01-01'}} # Corrected metadata metadata = {'source': 'doc1', 'author': 'John Doe', 'date': '2023-01-01'} collection.add(documents=['My document content'], metadatas=[metadata], ids=['id1']) ```
Warnings
- breaking chromadb.Client(Settings(...)) removed in 0.4.0. Enormous volume of tutorials, LangChain/LlamaIndex integration examples, and LLM-generated code still uses it. Raises AttributeError or TypeError on import.
- breaking Database migrations between Chroma versions are irreversible. Upgrading the chromadb package upgrades on-disk data format. Downgrading after upgrade causes data loss or corruption.
- breaking Server CORS and auth configuration moved from environment variables to a YAML config file in the 1.x Rust-backed server. Environment variables like CHROMA_SERVER_CORS_ALLOW_ORIGINS and CHROMA_SERVER_AUTH_CREDENTIALS no longer work.
- gotcha Default embedding function downloads ~200MB of model weights (all-MiniLM-L6-v2 via onnxruntime) on first call. First add() or query() call in a new environment hangs while downloading. No progress indicator.
- gotcha PersistentClient does not support concurrent access from multiple processes. SQLite-backed storage uses file locking. Multiple processes writing to the same path cause database corruption or blocked writes.
- gotcha collection.query() where= filter uses a specific operator syntax ($eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or). Plain dict equality {"key": "value"} is not valid — must be {"key": {"": "value"}}. Raises ValueError silently in old versions, error in new.
- gotcha Telemetry is enabled by default (sends anonymized usage data to PostHog). Runs on every client init.
- breaking Installation fails on Alpine Linux (musl-based distributions) due to missing C/C++ build tools and runtime libraries required by the Rust backend. Specifically, `libgcc_s.so.1` is not found and a `cc` linker is missing, leading to `subprocess.CalledProcessError` during package metadata preparation.
Install
-
pip install chromadb -
pip install chromadb-client
Imports
- EphemeralClient / PersistentClient / HttpClient
import chromadb from chromadb.config import Settings client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory="/db"))
import chromadb client = chromadb.EphemeralClient() # in-memory client = chromadb.PersistentClient(path="/db") # disk client = chromadb.HttpClient(host="localhost", port=8000) # server
Quickstart
import sys
if sys.version_info < (3, 9):
raise RuntimeError("chromadb requires Python 3.9+. Current: " +
sys.version)
import chromadb
# In-memory (prototyping)
client = chromadb.EphemeralClient()
# Persistent (local dev)
# client =
chromadb.PersistentClient(path="/path/to/db")
collection = client.get_or_create_collection("my_docs")
collection.add(
documents=["This is doc one", "This is doc two"],
ids=["id1", "id2"],
)
results = collection.query(
query_texts=["find
something"],
n_results=2,
)
print(results)