Pinecone Text Client
The Pinecone Text Client is a Python package that provides text utilities for generating sparse, dense, and hybrid vector embeddings. It is designed for seamless integration with Pinecone's vector database to facilitate sparse-dense (hybrid) semantic search. Currently, it is a public preview ('Beta') version, with the latest release being 0.11.0. Release cadence is infrequent, focusing on feature additions and improvements within its beta phase.
Common errors
-
KeyError: 'OPENAI_API_KEY'
cause The `OpenAIEncoder` requires the `OPENAI_API_KEY` environment variable to be set, but it was not found.fixSet the environment variable: `export OPENAI_API_KEY='your_api_key'` in your shell, or `os.environ['OPENAI_API_KEY'] = 'your_api_key'` in your Python script before initializing `OpenAIEncoder`. -
ModuleNotFoundError: No module named 'pinecone_client'
cause You are trying to import `pinecone_client` which is the old name for the main Pinecone Python SDK. The package was renamed to `pinecone`.fixUpdate your imports from `pinecone_client` to `pinecone`. Also, ensure you have `pinecone` installed: `pip install pinecone` and `pip uninstall pinecone-client` if it's still present. -
RuntimeError: 'Unsupported Python version for SPLADE/Sentence Transformers'
cause You are attempting to use SPLADE or Sentence Transformer encoders with Python 3.12, which currently has known compatibility issues due to PyTorch.fixDowngrade your Python environment to version 3.9, 3.10, or 3.11 if you need to use these specific encoders. Monitor `pinecone-text` release notes for updates on Python 3.12 compatibility.
Warnings
- gotcha The `pinecone-text` library is currently in 'public preview' ('Beta'). This means its API or behavior may change in future updates.
- breaking The main Pinecone Python SDK was renamed from `pinecone-client` to `pinecone` in version 5.1.0. If you are migrating or have existing projects, ensure you update your dependencies to `pinecone` to get the latest features and avoid conflicts.
- gotcha The `BM25Encoder` currently supports only static document frequency. This means precomputed document frequency values are fixed and do not dynamically update when new documents are added to a collection.
- breaking Using SPLADE and Sentence Transformer models with `pinecone-text` is not currently supported on Python 3.12 due to compatibility issues with PyTorch.
- gotcha When using `OpenAIEncoder` or `AzureOpenAIEncoder`, the corresponding API key (e.g., `OPENAI_API_KEY`) must be set as an environment variable before the encoder is initialized.
Install
-
pip install pinecone-text
Imports
- OpenAIEncoder
from pinecone_text.dense import OpenAIEncoder
- AzureOpenAIEncoder
from pinecone_text.dense import AzureOpenAIEncoder
- SpladeEncoder
from pinecone_text.sparse import SpladeEncoder
- BM25Encoder
from pinecone_text.sparse import BM25Encoder
- hybrid_convex_scale
from pinecone_text.hybrid import hybrid_convex_scale
Quickstart
import os
from pinecone_text.dense import OpenAIEncoder
# Ensure OPENAI_API_KEY is set in your environment
# For quick testing, you can uncomment and set it directly, but prefer environment variables
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
if not os.environ.get("OPENAI_API_KEY"):
print("Error: OPENAI_API_KEY environment variable not set.")
print("Please set it or uncomment the line above for testing.")
else:
try:
encoder = OpenAIEncoder() # Defaults to 'text-embedding-3-small'
documents = [
"The quick brown fox jumps over the lazy dog",
"Artificial intelligence is transforming industries"
]
queries = [
"Who jumped over the lazy dog?",
"What is AI doing?"
]
document_vectors = encoder.encode_documents(documents)
query_vectors = encoder.encode_queries(queries)
print(f"Encoded document 1 vector (first 5 elements): {document_vectors[0][:5]}...")
print(f"Encoded query 1 vector (first 5 elements): {query_vectors[0][:5]}...")
except Exception as e:
print(f"An error occurred during encoding: {e}")