MongoDB integration for LangChain
This package provides integrations for MongoDB products within the LangChain ecosystem, including VectorStore, DocumentLoader, and ChatMessageHistory capabilities. It is currently at version 0.11.0 and is actively maintained, receiving frequent updates to align with LangChain's evolving architecture and MongoDB's features.
Warnings
- breaking LangChain's ecosystem split led to many integrations, including MongoDB, moving from `langchain` or `langchain-community` into dedicated packages like `langchain-mongodb`. Older import paths are deprecated or will result in `ModuleNotFoundError`.
- gotcha `MongoDBAtlasVectorSearch` requires a pre-configured Atlas Search Index (type 'Vector Search') on your MongoDB collection. If the `index_name` specified in your code does not exist or is misconfigured, operations will fail.
- gotcha All vector store operations (adding documents, performing similarity searches) require an instantiated embedding model. Forgetting to provide one or providing an incorrectly configured model will lead to errors.
- gotcha The MongoDB connection URI (`MONGODB_ATLAS_CLUSTER_URI`) must be correctly formatted, especially for Atlas clusters (e.g., `mongodb+srv://user:pass@cluster-name.mongodb.net/`). Incorrect protocols or missing credentials will result in connection failures.
Install
-
pip install langchain-mongodb pymongo -
pip install langchain-mongodb pymongo langchain-openai
Imports
- MongoDBAtlasVectorSearch
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
- MongoDBLoader
from langchain_mongodb import MongoDBLoader
- MongoDBChatMessageHistory
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
Quickstart
import os
from pymongo import MongoClient
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
# NOTE: Replace DummyEmbeddings with a real embedding model (e.g., OpenAIEmbeddings)
# For a runnable example without extra API keys, we use a placeholder.
class DummyEmbeddings:
def embed_documents(self, texts):
# Return a list of fixed-size vectors for each text
return [[0.1] * 1536 for _ in texts]
def embed_query(self, text):
# Return a fixed-size vector for a single query
return [0.1] * 1536
# Environment variables for MongoDB connection
MONGODB_ATLAS_CLUSTER_URI = os.environ.get(
"MONGODB_ATLAS_CLUSTER_URI", "mongodb://localhost:27017/"
)
MONGODB_DATABASE = os.environ.get("MONGODB_DATABASE", "langchain_db")
MONGODB_COLLECTION = os.environ.get("MONGODB_COLLECTION", "vector_collection")
# Initialize MongoDB client and collection
client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)
collection = client[MONGODB_DATABASE][MONGODB_COLLECTION]
# Initialize embedding model (replace DummyEmbeddings with e.g., OpenAIEmbeddings)
# embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
embeddings = DummyEmbeddings()
# Initialize MongoDB Atlas Vector Search
# Ensure 'default' index exists in MongoDB Atlas on the specified collection
vector_search = MongoDBAtlasVectorSearch(
collection=collection,
embedding=embeddings,
index_name="default", # The name of your Atlas Search Vector Index
)
# Add documents to the vector store
docs = [
"The quick brown fox jumps over the lazy dog.",
"A group of cats is called a clowder.",
"Python is a high-level, interpreted programming language."
]
vector_search.add_texts(docs)
print(f"Added {len(docs)} documents to MongoDB Atlas Vector Search.")
# Perform a similarity search
query = "animals running"
results = vector_search.similarity_search(query, k=1)
print(f"Similarity search results for '{query}':")
for res in results:
print(f"- {res.page_content}")
# Clean up (optional) - remove added documents
# collection.delete_many({"text": {"$in": docs}})
# print("Cleaned up documents.")