LangChain Pinecone Integration
langchain-pinecone is an integration package that connects LangChain applications with Pinecone, a leading vector database. It facilitates storing, retrieving, and managing vector embeddings to power AI search, recommendation, and generative AI features within the LangChain ecosystem. The current version is 0.2.13 and it follows the release cadence of the broader LangChain ecosystem, with frequent updates.
Warnings
- breaking `pinecone-client` v3.0.0 introduced a breaking change to how the Pinecone client is initialized. The global `pinecone.init()` function was deprecated in favor of instantiating the `pinecone.Pinecone` class directly.
- gotcha Mismatch between the embedding model's dimension and the Pinecone index's dimension. If the index is created with a dimension (e.g., 768 for `all-MiniLM-L6-v2`) and you try to insert vectors from an embedding model with a different dimension (e.g., 1536 for OpenAI `text-embedding-ada-002`), it will result in an error.
- gotcha Incorrect or missing Pinecone API key or environment configuration. This is a common setup issue that leads to authentication or connection errors.
- gotcha When using `PineconeVectorStore.from_existing_index()`, the specified Pinecone index must already exist. If it does not, this method will raise an error.
- deprecated Older versions of LangChain (prior to v0.1.x) might have provided integration classes directly under `langchain.vectorstores.Pinecone`. The recommended approach for LangChain v0.1.x and newer is to use the dedicated `langchain-pinecone` package.
Install
-
pip install -U langchain-pinecone pinecone-client langchain-openai
Imports
- PineconeVectorStore
from langchain_pinecone import PineconeVectorStore
- OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
- Pinecone
from pinecone import Pinecone
Quickstart
import os
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from pinecone import Pinecone, ServerlessSpec
# --- Configuration (replace with your actual keys and environment) ---
# It's recommended to set these as environment variables.
PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY", "YOUR_PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.environ.get("PINECONE_ENVIRONMENT", "gcp-starter") # e.g., 'us-west-2'
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY")
if PINECONE_API_KEY == "YOUR_PINECONE_API_KEY" or OPENAI_API_KEY == "YOUR_OPENAI_API_KEY":
print("Warning: Please set PINECONE_API_KEY and OPENAI_API_KEY environment variables.")
print("Quickstart will likely fail due to missing credentials.")
index_name = "my-langchain-test-index"
dimension = 1536 # OpenAI text-embedding-ada-002 model dimension
metric = "cosine"
# --- Initialize Pinecone Client (pinecone-client v3.x recommended) ---
try:
pc = Pinecone(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)
except Exception as e:
print(f"Error initializing Pinecone client: {e}")
exit(1)
# --- Create/Connect to Pinecone Index ---
if index_name not in pc.list_indexes().names():
print(f"Creating Pinecone index '{index_name}'...")
pc.create_index(
name=index_name,
dimension=dimension,
metric=metric,
spec=ServerlessSpec(cloud="aws", region="us-west-2") # Adjust spec as needed
)
print(f"Index '{index_name}' created.")
else:
print(f"Connecting to existing Pinecone index '{index_name}'.")
# --- Initialize Embeddings Model ---
embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)
# --- Prepare Documents ---
documents = [
Document(page_content="The quick brown fox jumps over the lazy dog."),
Document(page_content="A computer is an electronic device that processes data."),
Document(page_content="LangChain is a framework for developing applications powered by language models."),
Document(page_content="Pinecone is a vector database for building AI applications.")
]
# --- Create or Connect to the Vector Store from Documents ---
# This method handles embedding and upserting the documents.
print("Adding documents to Pinecone vector store...")
vectorstore = PineconeVectorStore.from_documents(
documents, embeddings, index_name=index_name
)
print("Documents added.")
# --- Perform a Similarity Search ---
query = "What is LangChain?"
print(f"\nPerforming similarity search for: '{query}'")
results = vectorstore.similarity_search(query, k=1)
print("\nSearch Results:")
for doc in results:
print(f"- Content: {doc.page_content}")
# --- Optional: Clean up ---
# print(f"\nDeleting index '{index_name}' for cleanup...")
# pc.delete_index(index_name)
# print(f"Index '{index_name}' deleted.")