chDB
chDB is an in-process OLAP SQL Engine powered by ClickHouse, enabling users to embed a powerful analytical database directly within their Python applications. It allows running SQL queries on various data formats (Parquet, CSV, JSON, Pandas DataFrames) without needing a separate database server. Currently at version 4.1.6, chDB maintains an active development and release cadence, frequently adding features and improvements.
Warnings
- gotcha chDB is an in-process engine and shares memory with your application. Running complex queries that process large datasets (e.g., aggregating 10GB on an 8GB RAM machine) can lead to out-of-memory crashes for the entire Python process, unlike server-side databases that can spill to disk.
- gotcha chDB operates in a single process and lacks built-in authentication, multi-tenancy, or fine-grained resource limits per user. This makes it unsuitable for multi-user applications or highly concurrent environments where resource isolation and access control are critical.
- gotcha When using the DataStore (Pandas-compatible API) with chained operations in chDB v4.x, each intermediate step can materialize a new DataFrame in memory. This can lead to higher memory consumption than anticipated for large datasets, potentially negating some performance benefits.
- breaking In version 4.1.0, the `chdb` package was decoupled from `chdb-core`. While this was primarily an architectural change for packaging, users who had deep integrations or relied on specific internal structures related to `chdb-core` might experience breaking changes.
- gotcha Versions prior to 4.1.0 were known to experience crashes when exiting the Python process, particularly with persistent sessions.
- gotcha An issue in versions prior to 4.1.4 could lead to a broken module after upgrading, due to a missing `chdb/__init__.py` file.
Install
-
pip install chdb
Imports
- chdb
import chdb
- Session
from chdb import session as chs
- dbapi
import chdb.dbapi as dbapi
- DataStore
from chdb.datastore import DataStore
Quickstart
import chdb
import pandas as pd
# Run a simple SQL query and get results as a Pandas DataFrame
result_df = chdb.query("SELECT 1 as id, 'Hello chDB!' as message, version() as chdb_version", "DataFrame")
print("Query Result (DataFrame):\n", result_df)
# Query an existing Pandas DataFrame directly
data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
mypandas_df = pd.DataFrame(data)
sql_on_df = "SELECT col1, upper(col2) FROM python(mypandas_df) WHERE col1 > 1"
queried_df_from_pandas = chdb.query(sql_on_df, "DataFrame")
print("\nQuery Result from Pandas DataFrame (DataFrame):\n", queried_df_from_pandas)