chDB-core: In-process OLAP SQL Engine
chDB-core is an in-process OLAP SQL Engine, embedding the powerful analytical capabilities of ClickHouse directly within Python applications. It allows users to execute high-performance SQL queries without the need for a separate ClickHouse server installation or network communication. The library maintains an active development and release cadence, with version 26.1.0 being the current stable release.
Common errors
-
Program received signal SIGSEGV, Segmentation fault.
cause Often caused by out-of-memory conditions when processing large datasets within the application's process, or memory management conflicts, especially with `jemalloc`.fixReduce the data volume processed by the query, utilize streaming APIs, or increase available RAM. If using `jemalloc`, try disabling it (`-DENABLE_JEMALLOC=0` during build) or using `LD_PRELOAD` to ensure consistent memory allocation across C++ and Python boundaries. -
AttributeError: module 'chdb' has no attribute 'DataStore'
cause You installed `chdb-core` but are trying to use the higher-level, Pandas-compatible DataStore API which is now part of the separate `chdb` package.fixIf you intend to use the DataStore API, install the `chdb` package: `pip install chdb`. If you only need the core SQL engine, refactor your code to use `chdb.query()` or the `chdb.dbapi` module directly. -
Error in Python UDF: Function 'my_udf' failed to execute.
cause Typically arises from incorrect UDF implementation, such as attempting to maintain state across calls, incorrect argument types (expecting non-strings), or missing imports within the UDF scope.fixEnsure the UDF is stateless. Explicitly convert string inputs to desired types (e.g., `int()`, `float()`) at the start of the function. Move any necessary `import` statements (e.g., `import json`) *inside* the UDF function body. Review UDF best practices in the documentation.
Warnings
- breaking The `chdb` project has been split into two packages: `chdb-core` (this package) and `chdb`. `chdb-core` provides the low-level C++ engine and SQL interfaces (`query`, `dbapi`). If you were previously using `chdb` for its higher-level Pandas-compatible DataStore API, you must now explicitly install `chdb` (i.e., `pip install chdb`) instead of, or in addition to, `chdb-core`.
- gotcha As an in-process OLAP engine, chDB-core shares the application's memory space. Running queries that process or aggregate very large datasets (e.g., 10GB on an 8GB RAM machine) can lead to out-of-memory errors and application crashes, unlike a standalone ClickHouse server which might spill to disk. Similarly, long-running queries can block the entire application process.
- gotcha User-Defined Functions (UDFs) must be stateless and pure Python functions. All required modules for a UDF must be imported *inside* the UDF function itself, not at the module level. Input arguments are always treated as strings, and the default return type is also a string, requiring explicit type conversions within the UDF for numerical operations.
Install
-
pip install chdb-core
Imports
- query
import chdb chdb.query(...)
- dbapi
import chdb.dbapi as dbapi conn = dbapi.connect(...)
- chdb_udf
from chdb import chdb_udf
from chdb.udf import chdb_udf
Quickstart
import chdb
# Execute a simple SQL query
result = chdb.query("SELECT 'Hello, chDB-core!' as message, version() as chdb_version", "Pretty")
print(result)
# Example with DataFrame output (requires pandas to be installed)
try:
import pandas as pd
df = chdb.query("SELECT number, number * 2 AS double FROM numbers(5)", "DataFrame")
print(df)
print(type(df))
except ImportError:
print("Pandas not installed. Skipping DataFrame example.")