ConnectorX
ConnectorX is a high-performance Python library for loading data from databases into dataframes (Pandas, Polars, Apache Arrow). Written in Rust, it bypasses Python's GIL, offering significant speedups for data-intensive operations. The current version is 0.4.5, and it has an active development cycle with frequent minor releases.
Warnings
- breaking ConnectorX now requires Python 3.10 or higher. Users on older Python versions (3.9 and below) will not be able to install or run the latest versions.
- breaking Error handling for missing module dependencies has changed. Previously, a `ValueError` might have been raised; it is now `ModuleNotFoundError`.
- gotcha ConnectorX's core functionality relies on optional dependencies (pandas, polars, pyarrow) based on the `return_type` parameter. If you request `return_type="pandas"` without `pandas` installed, it will fail.
- gotcha Internal changes related to Arrow libraries (e.g., removal of `arrow2` in v0.4.2, and subsequent `pyarrow` bumps) may affect advanced users who rely on specific Arrow version compatibility or internal structures. This can lead to unexpected type conversions or performance regressions if not carefully managed with existing Arrow-based workflows.
- gotcha ConnectorX's connection strings are strict and database-specific. Minor syntax errors (e.g., missing slashes, incorrect port, invalid parameters) will prevent connection without clear specific error messages, often resulting in generic connection failures.
Install
-
pip install connectorx -
pip install "connectorx[pandas]" # For pandas return_type pip install "connectorx[polars]" # For polars return_type pip install "connectorx[arrow]" # For arrow return_type
Imports
- read_sql
from connectorx import read_sql
Quickstart
import connectorx as cx
import pandas as pd
import os
# Example PostgreSQL connection string
# Replace with your actual database connection string
DB_CONNECTION_STRING = os.environ.get('CX_DB_CONNECTION_STRING', 'postgresql://user:password@host:5432/database')
if DB_CONNECTION_STRING == 'postgresql://user:password@host:5432/database':
print("Warning: Please set CX_DB_CONNECTION_STRING environment variable for a real database connection.")
print("Using a dummy connection string, this example might not run without a database.")
# For a runnable dummy, you might use SQLite in-memory, but ConnectorX doesn't directly support that
# The primary use case is external DBs.
exit(1) # Exit if not configured, as it won't connect without a real string
query = "SELECT id, name FROM my_table WHERE id < 10"
try:
df = cx.read_sql(DB_CONNECTION_STRING, query, return_type="pandas")
print("Successfully read data:")
print(df.head())
except Exception as e:
print(f"An error occurred: {e}")
print("Ensure your database connection string and query are correct, and the database is accessible.")