Apache Airflow Provider for pgvector
raw JSON → 1.7.1 verified Fri May 01 auth: no python
An Apache Airflow provider package that integrates pgvector, enabling vector similarity search operations in Airflow via hooks and operators. Current version is 1.7.1, released under the Airflow provider maintenance cadence.
pip install apache-airflow-providers-pgvector Common errors
error airflow.exceptions.AirflowException: The conn_type `pgvector` is not supported ↓
cause Using conn_type='pgvector' instead of 'postgres' for the Airflow connection.
fix
Change the connection's conn_type to 'postgres' (or 'postgresql') and set the host/port/database accordingly.
error ModuleNotFoundError: No module named 'airflow.providers.pgvector.hooks.pgvector_hook' ↓
cause Importing from the old module path used in provider versions before 1.0.0.
fix
Update import to: from airflow.providers.pgvector.hooks.pgvector import PgvectorHook
error psycopg2.errors.UndefinedObject: type "vector" does not exist ↓
cause pgvector extension is not installed in the PostgreSQL database.
fix
Run: CREATE EXTENSION vector; in your PostgreSQL database with superuser privileges.
Warnings
breaking In provider version 1.0.0, the import paths changed from 'airflow.providers.pgvector.hooks.pgvector_hook' to 'airflow.providers.pgvector.hooks.pgvector' and operators similarly. ↓
fix Update imports to use the new paths as shown in the imports section.
gotcha The provider requires the pgvector extension to be installed in PostgreSQL; simply installing the Python package is insufficient. ↓
fix Ensure your PostgreSQL instance has the pgvector extension installed (e.g., CREATE EXTENSION vector;).
gotcha Connection configuration: the hook expects an Airflow connection with conn_type='postgres' and extra containing pgvector-specific parameters. Using wrong conn_type or missing extras will cause connection failures. ↓
fix Set up a Postgres connection in Airflow (conn_type='postgres') and in extras provide e.g., {"pgvector": true}. Ensure host, schema, login, password are correct.
Install
pip install apache-airflow[pgvector] Imports
- PgvectorHook
from airflow.providers.pgvector.hooks.pgvector import PgvectorHook - PgvectorOperator
from airflow.providers.pgvector.operators.pgvector import PgvectorOperator
Quickstart
from airflow import DAG
from datetime import datetime
from airflow.providers.pgvector.operators.pgvector import PgvectorOperator
with DAG(
dag_id='pgvector_demo',
start_date=datetime(2024, 1, 1),
schedule=None,
catchup=False
) as dag:
create_table = PgvectorOperator(
task_id='create_hnsw_index',
sql="CREATE TABLE IF NOT EXISTS items (id bigserial PRIMARY KEY, embedding vector(3))"
)
insert = PgvectorOperator(
task_id='insert_vector',
sql="INSERT INTO items (embedding) VALUES ('[1,2,3]'::vector)"
)
create_table >> insert