langchain-chroma Integration

1.1.0 · active · verified Sat Apr 11

langchain-chroma is an integration package connecting Chroma, an AI-native open-source vector database, with the LangChain framework. It enables developers to leverage Chroma for tasks such as semantic search, Retrieval-Augmented Generation (RAG), and other LLM applications. Currently at version 1.1.0, it is actively developed and maintained as part of LangChain's partner integrations, with releases often aligned with the broader LangChain ecosystem.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up a Chroma vector store with LangChain, using OpenAI embeddings. It covers document loading, splitting, vector store initialization, and performing a similarity search. For local persistence, uncomment the `persist_directory` argument during Chroma initialization.

import os
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Set your OpenAI API key from environment variables
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")

# Sample documents
raw_documents = [
    "The quick brown fox jumps over the lazy dog.",
    "The cat sat on the mat.",
    "Chroma is an open-source vector database.",
    "LangChain provides tools for building LLM applications.",
    "RAG combines retrieval and generation for better answers."
]

# 1. Split documents (optional but good practice for RAG)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = [Document(page_content=d) for d in raw_documents]
split_documents = text_splitter.split_documents(documents)

# 2. Initialize embeddings
embeddings = OpenAIEmbeddings()

# 3. Create a Chroma vector store (in-memory for this example)
# For persistence, pass a 'persist_directory' argument: persist_directory="./chroma_db"
vector_store = Chroma(
    collection_name="my_documents_collection",
    embedding_function=embeddings,
)

# Add documents to the vector store
vector_store.add_documents(split_documents)

# 4. Perform a similarity search
query = "What is Chroma?"
results = vector_store.similarity_search(query, k=1)

print(f"Query: {query}")
for doc in results:
    print(f"- Found document: {doc.page_content}")

view raw JSON →