pgvector

raw JSON →
0.8.2 verified Tue May 12 auth: no python install: stale quickstart: stale

Open-source PostgreSQL extension for vector similarity search. Two components: (1) the server-side Postgres extension (C, compiled and installed into Postgres), and (2) the Python client package 'pgvector' on PyPI which provides ORM/adapter integrations for psycopg2, psycopg3, asyncpg, SQLAlchemy, Django, SQLModel, and Peewee. The extension name in SQL is 'vector' (CREATE EXTENSION vector), not 'pgvector'. Maintained by Andrew Kane. Current extension version: 0.8.2 (CVE security fix). Python client: 0.4.2.

pip install pgvector
error ModuleNotFoundError: No module named 'pgvector'
cause The Python client package 'pgvector' is not installed in the environment where the code is being run.
fix
Install the pgvector Python package using pip: pip install pgvector
error ERROR: type "vector" does not exist
cause The PostgreSQL 'vector' extension has not been created in the database or the current user does not have access to it, preventing the use of the `vector` data type.
fix
Connect to your PostgreSQL database and execute the SQL command: CREATE EXTENSION IF NOT EXISTS vector; Ensure you are connected to the correct database and have sufficient permissions.
error operator does not exist: vector <-> double precision[]
cause This error typically occurs when trying to use pgvector operators (like `<->` for L2 distance) with an incompatible data type or when the vector extension is not in the active search path. It can also happen if the input vector is not explicitly cast to the `vector` type.
fix
Ensure the input array is explicitly cast to vector in your SQL query, for example: SELECT * FROM items ORDER BY embedding <-> %s::vector LIMIT 1;. If using a Python client, ensure you are passing a compatible type or register the vector type with the driver (e.g., pgvector.psycopg.register_vector(conn) for psycopg).
error ModuleNotFoundError: No module named 'pgvector.sqlalchemy'; 'pgvector' is not a package
cause This specific error indicates an incorrect import path for SQLAlchemy integration with pgvector, particularly in applications using `langchain_postgres`. The `Vector` type for SQLAlchemy is often directly available from the `pgvector` top-level package or an older/different `langchain` integration might be attempting a non-existent sub-module import.
fix
For SQLAlchemy integration, import the Vector type directly from pgvector (e.g., from pgvector.sqlalchemy import Vector might be incorrect, try from pgvector.sqlalchemy import VectorColumn or depending on the library version, the Vector type might be directly available after importing pgvector or used as a type annotation if pgvector is registered as a dialect). If using LangChain, ensure you have the correct langchain-postgres package and its compatible pgvector version installed.
breaking CVE-2026-3172: Buffer overflow with parallel HNSW index builds in versions 0.6.0–0.8.1. Can leak sensitive data from other relations or crash the database server. Fixed in 0.8.2.
fix Upgrade to pgvector 0.8.2 immediately.
breaking Illegal instruction crashes (SIGILL) when pgvector is compiled with -march=native on one CPU architecture and run on another. Occurs on managed cloud Postgres (Azure Flexible Server, some GCP instances) after upgrading to 0.8.0+.
fix Report to your cloud provider. If self-hosting, compile on the same CPU architecture as the runtime. Cannot be worked around from the client side.
breaking LangChain's langchain-postgres package requires psycopg3 (package name: psycopg). Connection strings must use postgresql+psycopg:// not postgresql+psycopg2://. Mixing drivers causes driver-not-found errors.
fix pip install psycopg[binary]. Use connection string postgresql+psycopg://user:pass@host/db.
breaking Postgres 17.0–17.2 causes link error: 'unresolved external symbol float_to_shortest_decimal_bufn' when building pgvector from source.
fix Upgrade to Postgres 17.3+.
gotcha The SQL extension name is 'vector', not 'pgvector'. CREATE EXTENSION pgvector raises 'extension not found'. This is a consistent source of confusion.
fix Always use: CREATE EXTENSION IF NOT EXISTS vector;
gotcha register_vector(conn) must be called after every new connection. It is not persistent. Failing to call it means vector columns are returned as raw strings, not numpy arrays. No error is raised — silent wrong behavior.
fix Call register_vector(conn) immediately after psycopg2.connect(). For connection pools, call it in the connection setup callback.
gotcha HNSW and IVFFlat indexes without ORDER BY + LIMIT do not use the ANN index — Postgres falls back to sequential scan. Queries without LIMIT return exact results but at O(n) cost.
fix Always include ORDER BY embedding <-> $1 LIMIT k in vector search queries. Without LIMIT, the index is not used.
gotcha COSINE distance in pgvector uses the range [0, 2], not [0, 1]. 0 = identical, 2 = opposite. Thresholds from other libraries (which use [0,1]) must be remapped.
fix Use pgvector cosine thresholds in [0, 2]. Equivalent: pgvector_threshold = 1 - cosine_similarity.
gotcha IVFFlat index must be built AFTER data is loaded. Creating the index on an empty table and then inserting data results in a near-useless index (lists are not representative of the data distribution).
fix Load all or most data first, then run CREATE INDEX. For ongoing ingestion, rebuild or use HNSW which handles incremental inserts better.
sudo apt install postgresql-17-pgvector
brew install pgvector
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=pass pgvector/pgvector:pg17
git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git && cd pgvector && make && make install
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -

register_vector(conn) must be called after connecting — it registers the custom 'vector' type with psycopg2. Without it, vectors are returned as strings. Extension must be enabled server-side first with CREATE EXTENSION vector.

# Step 1: Enable extension in Postgres (run once per database)
# CREATE EXTENSION IF NOT EXISTS vector;

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

conn = psycopg2.connect("dbname=mydb user=postgres")
register_vector(conn)  # REQUIRED: registers the vector type

cur = conn.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS items (id bigserial PRIMARY KEY, embedding vector(3))")

# Insert vectors
cur.execute("INSERT INTO items (embedding) VALUES (%s)", (np.array([1.0, 2.0, 3.0], dtype='float32'),))
conn.commit()

# L2 distance search (<->)
cur.execute("SELECT id FROM items ORDER BY embedding <-> %s LIMIT 5", (np.array([1.0, 1.0, 1.0], dtype='float32'),))
print(cur.fetchall())

cur.close()
conn.close()