Chroma Vector Store for LlamaIndex

0.5.5 · active · verified Thu Apr 16

The `llama-index-vector-stores-chroma` library provides an integration between LlamaIndex and ChromaDB, an open-source vector database. It enables users to store and query document embeddings within a ChromaDB collection, supporting various modes like in-memory, persistent, and client-server setups. This integration is actively maintained, with the current version being `0.5.5`, and typically follows the continuous release cadence of the broader LlamaIndex ecosystem for new features and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up `ChromaVectorStore` with LlamaIndex using an in-memory ChromaDB client. It includes loading documents, indexing them with the specified vector store, and performing a basic query. It uses `HuggingFaceEmbedding` for local embeddings and sets `Settings.embed_model` and `Settings.llm` (implicitly for query engine) for modern LlamaIndex configurations.

import os
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore

# --- Configuration ---
# For a fully local setup, use a local LLM and Embedding model.
# If using OpenAI, uncomment and set API key:
# from llama_index.llms.openai import OpenAI
# os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
# Settings.llm = OpenAI()

# Using a local embedding model for demonstration without external API keys
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# 1. Create a Chroma client and collection
# Use EphemeralClient for in-memory, or PersistentClient for disk storage
chroma_client = chromadb.EphemeralClient() # For persistent storage: chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.create_collection("my_documents_collection")

# 2. Set up the ChromaVectorStore
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# 3. Create a StorageContext and link the vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 4. Load documents (e.g., from a 'data' directory)
# Create a dummy file for demonstration if 'data' doesn't exist
if not os.path.exists("data"): os.makedirs("data")
with open("data/example.txt", "w") as f:
    f.write("The quick brown fox jumps over the lazy dog. LlamaIndex is great.")

documents = SimpleDirectoryReader("data").load_data()

# 5. Create a VectorStoreIndex from documents
# embed_model is implicitly used from Settings.embed_model
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# 6. Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is LlamaIndex?")

print(response.response)

view raw JSON →