LlamaIndex Postgres Vector Store
The `llama-index-vector-stores-postgres` library provides an integration for LlamaIndex, allowing users to leverage PostgreSQL with the `pgvector` extension as a robust and scalable vector store. It is currently at version 0.8.1 and follows LlamaIndex's active development and frequent release cadence, especially for integration packages.
Warnings
- breaking LlamaIndex v0.10.0 introduced a significant refactor, splitting the main library into `llama-index-core` and numerous integration packages. While namespace imports are generally preserved (e.g., `from llama_index.vector_stores.postgres import PGVectorStore`), the `ServiceContext` object was deprecated in favor of `Settings` or direct parameter passing for configuring LLMs, embeddings, etc.
- gotcha The PostgreSQL database must have the `pgvector` extension installed and enabled. Without this, the `PGVectorStore` will not function, and you may encounter SQL errors related to unknown vector types or functions.
- gotcha `PGVectorStore` may attempt to create a new schema in the database, which can lead to `psycopg2.errors.InsufficientPrivilege` errors if the connected database user lacks schema creation permissions.
- gotcha When persisting other index types (like `KeywordTableIndex`) alongside vector data in PostgreSQL, simply providing `PGVectorStore` to `StorageContext` is insufficient for their persistence. These other index types require a separate `IndexStore`.
Install
-
pip install llama-index-vector-stores-postgres llama-index psycopg2-binary
Imports
- PGVectorStore
from llama_index.vector_stores.postgres import PGVectorStore
- VectorStoreIndex
from llama_index.core import VectorStoreIndex
- SimpleDirectoryReader
from llama_index.core import SimpleDirectoryReader
- StorageContext
from llama_index.core import StorageContext
Quickstart
import os
import psycopg2
from sqlalchemy import make_url
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.postgres import PGVectorStore
# --- Database Setup (Example for local PostgreSQL) ---
# Ensure PostgreSQL is running and 'pgvector' extension is enabled.
# Example: CREATE EXTENSION IF NOT EXISTS vector; in your database.
DB_HOST = os.environ.get('PG_HOST', 'localhost')
DB_PORT = os.environ.get('PG_PORT', '5432')
DB_USER = os.environ.get('PG_USER', 'postgres')
DB_PASSWORD = os.environ.get('PG_PASSWORD', 'password')
DB_NAME = os.environ.get('PG_DATABASE', 'llama_db')
TABLE_NAME = os.environ.get('PG_TABLE_NAME', 'llamaindex_documents')
EMBED_DIM = int(os.environ.get('EMBED_DIM', '1536')) # e.g., for OpenAI embeddings
connection_string = f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}"
# Ensure the database exists and pgvector extension is enabled
try:
conn = psycopg2.connect(connection_string)
conn.autocommit = True
with conn.cursor() as c:
c.execute(f"DROP DATABASE IF EXISTS {DB_NAME} WITH (FORCE);")
c.execute(f"CREATE DATABASE {DB_NAME};")
conn.close()
print(f"Database '{DB_NAME}' created/recreated.")
except Exception as e:
print(f"Could not connect or create database: {e}")
print("Please ensure PostgreSQL is running and connection details are correct.")
exit(1)
# Connect to the specific database for PGVectorStore
db_connection_string = f"{connection_string}/{DB_NAME}"
# --- LlamaIndex Usage ---
# 1. Create a dummy document for indexing
if not os.path.exists("data"): os.makedirs("data")
with open("data/test_document.txt", "w") as f:
f.write("The quick brown fox jumps over the lazy dog. This is a test document for LlamaIndex and Postgres.")
documents = SimpleDirectoryReader("data").load_data()
# 2. Initialize the PGVectorStore
vector_store = PGVectorStore.from_params(
database=DB_NAME,
host=DB_HOST,
password=DB_PASSWORD,
port=int(DB_PORT),
user=DB_USER,
table_name=TABLE_NAME,
embed_dim=EMBED_DIM,
)
# 3. Create StorageContext and VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
)
# 4. Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the fox do?")
print(f"Response: {response}")
# Clean up
# Note: To properly drop tables, you often need to connect to a different database (e.g., 'postgres')
# or ensure no active connections to DB_NAME exist. For simplicity, we skip full table teardown here.
# A full teardown might involve:
# conn = psycopg2.connect(connection_string + '/postgres')
# conn.autocommit = True
# with conn.cursor() as c:
# c.execute(f"DROP TABLE IF EXISTS {TABLE_NAME} CASCADE;")
# conn.close()