{"id":8008,"library":"chdb-core","title":"chDB-core: In-process OLAP SQL Engine","description":"chDB-core is an in-process OLAP SQL Engine, embedding the powerful analytical capabilities of ClickHouse directly within Python applications. It allows users to execute high-performance SQL queries without the need for a separate ClickHouse server installation or network communication. The library maintains an active development and release cadence, with version 26.1.0 being the current stable release.","status":"active","version":"26.1.0","language":"en","source_language":"en","source_url":"https://github.com/chdb-io/chdb-core","tags":["olap","sql","clickhouse","embedded-database","data-analytics","in-process","python-dbapi"],"install":[{"cmd":"pip install chdb-core","lang":"bash","label":"Install chdb-core"}],"dependencies":[{"reason":"Required runtime environment","package":"python","optional":false},{"reason":"For DataFrame input/output formats and higher-level DataStore API (if 'chdb' package is also used)","package":"pandas","optional":true},{"reason":"For Apache Arrow input/output formats","package":"pyarrow","optional":true}],"imports":[{"note":"The primary query function is directly available under the 'chdb' namespace after installing 'chdb-core'.","symbol":"query","correct":"import chdb\nchdb.query(...)"},{"note":"For Python DB-API 2.0 compliant connections and cursors.","symbol":"dbapi","correct":"import chdb.dbapi as dbapi\nconn = dbapi.connect(...)"},{"note":"User-Defined Functions (UDFs) are located in the 'chdb.udf' submodule.","wrong":"from chdb import chdb_udf","symbol":"chdb_udf","correct":"from chdb.udf import chdb_udf"}],"quickstart":{"code":"import chdb\n\n# Execute a simple SQL query\nresult = chdb.query(\"SELECT 'Hello, chDB-core!' as message, version() as chdb_version\", \"Pretty\")\nprint(result)\n\n# Example with DataFrame output (requires pandas to be installed)\ntry:\n    import pandas as pd\n    df = chdb.query(\"SELECT number, number * 2 AS double FROM numbers(5)\", \"DataFrame\")\n    print(df)\n    print(type(df))\nexcept ImportError:\n    print(\"Pandas not installed. Skipping DataFrame example.\")","lang":"python","description":"This quickstart demonstrates how to perform a basic SQL query using the `chdb.query` function and how to get results in different formats, including a Pandas DataFrame. The `chdb-core` package exposes its primary functionalities via the `chdb` module."},"warnings":[{"fix":"Ensure you install the correct package: `pip install chdb-core` for core SQL engine, or `pip install chdb` for DataStore API built on `chdb-core`.","message":"The `chdb` project has been split into two packages: `chdb-core` (this package) and `chdb`. `chdb-core` provides the low-level C++ engine and SQL interfaces (`query`, `dbapi`). If you were previously using `chdb` for its higher-level Pandas-compatible DataStore API, you must now explicitly install `chdb` (i.e., `pip install chdb`) instead of, or in addition to, `chdb-core`.","severity":"breaking","affected_versions":"Introduced in v26.1.0 and subsequent versions."},{"fix":"Optimize queries to reduce memory footprint, process data in chunks (streaming), or consider using a separate ClickHouse server for extremely large or long-running tasks. Monitor memory usage carefully for in-process deployments.","message":"As an in-process OLAP engine, chDB-core shares the application's memory space. Running queries that process or aggregate very large datasets (e.g., 10GB on an 8GB RAM machine) can lead to out-of-memory errors and application crashes, unlike a standalone ClickHouse server which might spill to disk. Similarly, long-running queries can block the entire application process.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Design UDFs as pure functions without side effects. For example, `def my_udf(arg1, arg2): import json; return str(json.loads(arg1)['key'])`. Explicitly cast string inputs to appropriate types (e.g., `int(lhs)`, `float(rhs)`) and cast the result back to string if not specifying a return type.","message":"User-Defined Functions (UDFs) must be stateless and pure Python functions. All required modules for a UDF must be imported *inside* the UDF function itself, not at the module level. Input arguments are always treated as strings, and the default return type is also a string, requiring explicit type conversions within the UDF for numerical operations.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Reduce the data volume processed by the query, utilize streaming APIs, or increase available RAM. If using `jemalloc`, try disabling it (`-DENABLE_JEMALLOC=0` during build) or using `LD_PRELOAD` to ensure consistent memory allocation across C++ and Python boundaries.","cause":"Often caused by out-of-memory conditions when processing large datasets within the application's process, or memory management conflicts, especially with `jemalloc`.","error":"Program received signal SIGSEGV, Segmentation fault."},{"fix":"If you intend to use the DataStore API, install the `chdb` package: `pip install chdb`. If you only need the core SQL engine, refactor your code to use `chdb.query()` or the `chdb.dbapi` module directly.","cause":"You installed `chdb-core` but are trying to use the higher-level, Pandas-compatible DataStore API which is now part of the separate `chdb` package.","error":"AttributeError: module 'chdb' has no attribute 'DataStore'"},{"fix":"Ensure the UDF is stateless. Explicitly convert string inputs to desired types (e.g., `int()`, `float()`) at the start of the function. Move any necessary `import` statements (e.g., `import json`) *inside* the UDF function body. Review UDF best practices in the documentation.","cause":"Typically arises from incorrect UDF implementation, such as attempting to maintain state across calls, incorrect argument types (expecting non-strings), or missing imports within the UDF scope.","error":"Error in Python UDF: Function 'my_udf' failed to execute."}]}