LlamaIndex Postgres Vector Store

0.8.1 · active · verified Wed Apr 15

The `llama-index-vector-stores-postgres` library provides an integration for LlamaIndex, allowing users to leverage PostgreSQL with the `pgvector` extension as a robust and scalable vector store. It is currently at version 0.8.1 and follows LlamaIndex's active development and frequent release cadence, especially for integration packages.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `PGVectorStore`, integrate it with LlamaIndex's `VectorStoreIndex`, and perform a simple query. It includes basic PostgreSQL setup steps (database creation and `pgvector` extension enablement) and uses environment variables for sensitive connection details. Ensure you have a PostgreSQL server running and the `pgvector` extension installed and enabled in your database.

import os
import psycopg2
from sqlalchemy import make_url
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.postgres import PGVectorStore

# --- Database Setup (Example for local PostgreSQL) ---
# Ensure PostgreSQL is running and 'pgvector' extension is enabled.
# Example: CREATE EXTENSION IF NOT EXISTS vector; in your database.

DB_HOST = os.environ.get('PG_HOST', 'localhost')
DB_PORT = os.environ.get('PG_PORT', '5432')
DB_USER = os.environ.get('PG_USER', 'postgres')
DB_PASSWORD = os.environ.get('PG_PASSWORD', 'password')
DB_NAME = os.environ.get('PG_DATABASE', 'llama_db')
TABLE_NAME = os.environ.get('PG_TABLE_NAME', 'llamaindex_documents')
EMBED_DIM = int(os.environ.get('EMBED_DIM', '1536')) # e.g., for OpenAI embeddings

connection_string = f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}"

# Ensure the database exists and pgvector extension is enabled
try:
    conn = psycopg2.connect(connection_string)
    conn.autocommit = True
    with conn.cursor() as c:
        c.execute(f"DROP DATABASE IF EXISTS {DB_NAME} WITH (FORCE);")
        c.execute(f"CREATE DATABASE {DB_NAME};")
    conn.close()
    print(f"Database '{DB_NAME}' created/recreated.")
except Exception as e:
    print(f"Could not connect or create database: {e}")
    print("Please ensure PostgreSQL is running and connection details are correct.")
    exit(1)

# Connect to the specific database for PGVectorStore
db_connection_string = f"{connection_string}/{DB_NAME}"

# --- LlamaIndex Usage ---
# 1. Create a dummy document for indexing
if not os.path.exists("data"): os.makedirs("data")
with open("data/test_document.txt", "w") as f:
    f.write("The quick brown fox jumps over the lazy dog. This is a test document for LlamaIndex and Postgres.")

documents = SimpleDirectoryReader("data").load_data()

# 2. Initialize the PGVectorStore
vector_store = PGVectorStore.from_params(
    database=DB_NAME,
    host=DB_HOST,
    password=DB_PASSWORD,
    port=int(DB_PORT),
    user=DB_USER,
    table_name=TABLE_NAME,
    embed_dim=EMBED_DIM,
)

# 3. Create StorageContext and VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

# 4. Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the fox do?")
print(f"Response: {response}")

# Clean up
# Note: To properly drop tables, you often need to connect to a different database (e.g., 'postgres')
# or ensure no active connections to DB_NAME exist. For simplicity, we skip full table teardown here.
# A full teardown might involve:
# conn = psycopg2.connect(connection_string + '/postgres')
# conn.autocommit = True
# with conn.cursor() as c:
#     c.execute(f"DROP TABLE IF EXISTS {TABLE_NAME} CASCADE;")
# conn.close()

view raw JSON →