LangChain Milvus Integration
langchain-milvus is an integration package that connects Milvus, a powerful open-source vector database, with LangChain, a framework for building applications with large language models (LLMs). It enables efficient vector storage and retrieval for AI applications like semantic search and RAG. The library is actively maintained with frequent updates, typically releasing minor versions with new features and potentially breaking changes, and patch versions for bug fixes and improvements.
Warnings
- breaking Version 0.3.0 introduced a significant refactor, replacing the Milvus ORM (Object-Relational Mapping) with the lower-level Milvus Client API. If your code directly interacted with internal ORM specifics, it might require updates.
- breaking Version 0.2.2 updated internal dependencies to support `langchain-core` 1.0.0. Ensure your `langchain-core` installation is compatible (preferably 1.0.0 or newer) to avoid potential dependency conflicts or unexpected behavior.
- deprecated The `Milvus` class was deprecated in `langchain` version 0.2.0 and moved to its dedicated integration package, `langchain-milvus`. Using the old import path (`from langchain.vectorstores import Milvus`) will raise warnings or errors in newer LangChain versions.
- gotcha Connecting to Milvus: Milvus Lite (local file-based) is suitable for development and small datasets, requiring a URI like `./milvus_example.db`. For production or large-scale data, a full Milvus server (Docker/Kubernetes) is recommended, requiring a server URI (e.g., `http://localhost:19530`). Features like `partition_key` for multi-tenancy are often only available with a Milvus server.
- gotcha The reranker functionality was refactored in version 0.3.3 to adapt to Milvus 2.6. If you have custom reranker implementations, they might need adjustments to align with the new function signature or expected behavior.
- gotcha With LangChain 1.0 and above, `langchain-core` is often treated as a peer dependency. This means you need to explicitly manage the versions of `langchain`, `langchain-core`, and `langchain-milvus` to ensure compatibility and avoid dependency conflicts, especially when upgrading.
Install
-
pip install -qU langchain-milvus milvus-lite langchain-openai
Imports
- Milvus
from langchain_milvus import Milvus
- MilvusCollectionHybridSearchRetriever
from langchain_milvus import MilvusCollectionHybridSearchRetriever
Quickstart
import os
from langchain_milvus import Milvus
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
# Ensure OPENAI_API_KEY is set in your environment
# Replace 'YOUR_OPENAI_API_KEY' with your actual key if not using env vars.
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "sk-...")
# Initialize embeddings
embeddings = OpenAIEmbeddings()
# Connect to Milvus Lite (local file) or a Milvus server.
# For Milvus Lite, use a local file path as the URI. E.g., "./milvus_example.db"
# For a Milvus server, use its URI, e.g., "http://localhost:19530"
MILVUS_URI = "./milvus_example.db"
# Create a Milvus vector store
vector_store = Milvus(
embedding_function=embeddings,
collection_name="my_langchain_documents",
connection_args={"uri": MILVUS_URI},
auto_id=True # Milvus 2.2.0 or later supports auto-generated IDs
)
# Add documents
documents = [
Document(page_content="The quick brown fox jumps over the lazy dog.", metadata={"source": "lorem"}),
Document(page_content="Milvus is an open-source vector database designed for AI applications.", metadata={"source": "milvus_docs"}),
Document(page_content="LangChain is a framework for developing applications with LLMs.", metadata={"source": "langchain_docs"}),
]
vector_store.add_documents(documents)
# Perform a similarity search
query = "What is LangChain used for?"
results = vector_store.similarity_search(query, k=1)
print("Similarity search results:")
for doc in results:
print(f"- Content: {doc.page_content[:60]}... Metadata: {doc.metadata}")