langchain-chroma Integration
langchain-chroma is an integration package connecting Chroma, an AI-native open-source vector database, with the LangChain framework. It enables developers to leverage Chroma for tasks such as semantic search, Retrieval-Augmented Generation (RAG), and other LLM applications. Currently at version 1.1.0, it is actively developed and maintained as part of LangChain's partner integrations, with releases often aligned with the broader LangChain ecosystem.
Warnings
- breaking With the release of LangChain v0.1.0 and subsequent modularization, core components and integrations like Chroma moved to separate packages. Old import paths (e.g., `from langchain.vectorstores import Chroma`) are deprecated and will lead to `ImportError` or unexpected behavior.
- gotcha When connecting to a remote ChromaDB server, ensure the `chromadb` client version installed (as a dependency of `langchain-chroma`) is compatible with the server's version. Incompatible client/server versions, especially with major updates to ChromaDB (e.g., v1.x based on Rust), can lead to connection errors or unexpected behavior.
- gotcha Earlier versions of `langchain-chroma` (e.g., 0.2.2) had strict `numpy` dependency constraints (`numpy>=1.26.2,<2.0.0`) which could conflict with other packages requiring newer `numpy` versions (`>=2.0.0`), causing installation failures.
- gotcha When using `update_documents` with Chroma, some older versions might have expected explicit metadata, even if optional, leading to `ValueError` if an empty list was internally generated instead of `None`.
Install
-
pip install -qU langchain-chroma chromadb langchain-openai langchain-text-splitters
Imports
- Chroma
from langchain_chroma import Chroma
- OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
- RecursiveCharacterTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter
- Document
from langchain_core.documents import Document
Quickstart
import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Set your OpenAI API key from environment variables
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")
# Sample documents
raw_documents = [
"The quick brown fox jumps over the lazy dog.",
"The cat sat on the mat.",
"Chroma is an open-source vector database.",
"LangChain provides tools for building LLM applications.",
"RAG combines retrieval and generation for better answers."
]
# 1. Split documents (optional but good practice for RAG)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = [Document(page_content=d) for d in raw_documents]
split_documents = text_splitter.split_documents(documents)
# 2. Initialize embeddings
embeddings = OpenAIEmbeddings()
# 3. Create a Chroma vector store (in-memory for this example)
# For persistence, pass a 'persist_directory' argument: persist_directory="./chroma_db"
vector_store = Chroma(
collection_name="my_documents_collection",
embedding_function=embeddings,
)
# Add documents to the vector store
vector_store.add_documents(split_documents)
# 4. Perform a similarity search
query = "What is Chroma?"
results = vector_store.similarity_search(query, k=1)
print(f"Query: {query}")
for doc in results:
print(f"- Found document: {doc.page_content}")