Chroma Vector Store for LlamaIndex
The `llama-index-vector-stores-chroma` library provides an integration between LlamaIndex and ChromaDB, an open-source vector database. It enables users to store and query document embeddings within a ChromaDB collection, supporting various modes like in-memory, persistent, and client-server setups. This integration is actively maintained, with the current version being `0.5.5`, and typically follows the continuous release cadence of the broader LlamaIndex ecosystem for new features and bug fixes.
Common errors
-
ModuleNotFoundError: No module named 'llama_index.vector_stores'
cause The `llama-index-vector-stores-chroma` package is not installed, or `chromadb` (a peer dependency) is missing. Post LlamaIndex v0.10, integration packages are separate.fixRun `pip install llama-index-vector-stores-chroma chromadb`. Ensure your import statement is `from llama_index.vector_stores.chroma import ChromaVectorStore`. -
ERROR: Cannot install llama-index-cli because these package versions have conflicting dependencies. The conflict is caused by: llama-index-vector-stores-chroma X.Y.Z depends on onnxruntime<2.0.0 and >=1.17.0
cause Dependency conflict with `onnxruntime`, often arising when `chromadb` is installed with other packages that have different `onnxruntime` requirements.fixTry `pip install --upgrade pip` then `pip install --no-cache-dir llama-index-vector-stores-chroma chromadb`. If the issue persists, consider installing `onnxruntime` via `conda-forge` first if using Anaconda/Miniconda (`conda install -c conda-forge onnxruntime`) then `pip install`. -
ImportError: cannot import name 'VectorStoreIndex' from 'llama_index' (unknown location)
cause Breaking change in LlamaIndex v0.10+ where core components were moved to `llama_index.core`.fixChange your import from `from llama_index import VectorStoreIndex` to `from llama_index.core import VectorStoreIndex`. -
Expected IDs to be a non-empty list, got 0 IDs
cause This error occurs within ChromaDB's validation when `get_nodes` is called with an empty list for `node_ids`, which is then passed down to ChromaDB's internal `_get` method.fixModify your code to ensure that the `node_ids` parameter passed to `ChromaVectorStore.get_nodes` (or any underlying ChromaDB call) is always a list with at least one ID, or use appropriate filtering methods if you intend to retrieve nodes without specifying IDs.
Warnings
- breaking With LlamaIndex v0.10 and later, the library underwent a major packaging refactor. Integration packages like `llama-index-vector-stores-chroma` are now separate from `llama-index-core`. This changes import paths and requires explicit installation of integration packages.
- gotcha The `chromadb` package is a peer dependency of `llama-index-vector-stores-chroma` and must be installed separately. Failing to install `chromadb` will result in `ModuleNotFoundError` even if `llama-index-vector-stores-chroma` is installed.
- gotcha Version conflicts, particularly involving `onnxruntime`, can occur between `llama-index-vector-stores-chroma` and other installed packages (especially `chromadb`). This often manifests during installation.
- deprecated Older LlamaIndex concepts like `GPTSimpleVectorIndex`, `GPTVectorStoreIndex`, `ServiceContext`, and `LLMPredictor` have been deprecated or renamed.
- gotcha Calling the `get_nodes` function of `ChromaVectorStore` with an empty list (`[]`) for `node_ids` can raise a `Expected IDs to be a non-empty list` error due to validation changes in `chromadb`.
Install
-
pip install llama-index-vector-stores-chroma chromadb
Imports
- ChromaVectorStore
from llama_index.vector_stores import ChromaVectorStore
from llama_index.vector_stores.chroma import ChromaVectorStore
Quickstart
import os
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
# --- Configuration ---
# For a fully local setup, use a local LLM and Embedding model.
# If using OpenAI, uncomment and set API key:
# from llama_index.llms.openai import OpenAI
# os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
# Settings.llm = OpenAI()
# Using a local embedding model for demonstration without external API keys
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# 1. Create a Chroma client and collection
# Use EphemeralClient for in-memory, or PersistentClient for disk storage
chroma_client = chromadb.EphemeralClient() # For persistent storage: chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.create_collection("my_documents_collection")
# 2. Set up the ChromaVectorStore
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# 3. Create a StorageContext and link the vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 4. Load documents (e.g., from a 'data' directory)
# Create a dummy file for demonstration if 'data' doesn't exist
if not os.path.exists("data"): os.makedirs("data")
with open("data/example.txt", "w") as f:
f.write("The quick brown fox jumps over the lazy dog. LlamaIndex is great.")
documents = SimpleDirectoryReader("data").load_data()
# 5. Create a VectorStoreIndex from documents
# embed_model is implicitly used from Settings.embed_model
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# 6. Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is LlamaIndex?")
print(response.response)