{"id":1960,"library":"chdb","title":"chDB","description":"chDB is an in-process OLAP SQL Engine powered by ClickHouse, enabling users to embed a powerful analytical database directly within their Python applications. It allows running SQL queries on various data formats (Parquet, CSV, JSON, Pandas DataFrames) without needing a separate database server. Currently at version 4.1.6, chDB maintains an active development and release cadence, frequently adding features and improvements.","status":"active","version":"4.1.6","language":"en","source_language":"en","source_url":"https://github.com/chdb-io/chdb","tags":["database","olap","sql","clickhouse","in-process","data-analytics","embedded"],"install":[{"cmd":"pip install chdb","lang":"bash","label":"Install chDB"}],"dependencies":[{"reason":"chDB builds upon chdb-core, which provides the underlying ClickHouse engine. While `pip install chdb` handles this automatically, awareness can be useful for debugging or advanced scenarios.","package":"chdb-core","optional":false},{"reason":"Highly recommended for seamless integration with Pandas DataFrames, including direct querying and results output. Essential for the DataStore API.","package":"pandas","optional":true},{"reason":"Recommended for efficient data exchange and integration with Apache Arrow, especially when working with columnar data formats and DataFrame outputs.","package":"pyarrow","optional":true}],"imports":[{"symbol":"chdb","correct":"import chdb"},{"note":"Used for stateful sessions to maintain database state across queries.","symbol":"Session","correct":"from chdb import session as chs"},{"note":"For using chDB with the Python DB-API 2.0 interface.","symbol":"dbapi","correct":"import chdb.dbapi as dbapi"},{"note":"While `import chdb.datastore as pd` is often suggested for a Pandas-like API, directly importing `DataStore` or `chdb` for `chdb.query` is the primary usage for the core engine features. The `as pd` pattern aims for a drop-in replacement, which might mask `chdb`'s distinct behaviors.","wrong":"import chdb.datastore as pd","symbol":"DataStore","correct":"from chdb.datastore import DataStore"}],"quickstart":{"code":"import chdb\nimport pandas as pd\n\n# Run a simple SQL query and get results as a Pandas DataFrame\nresult_df = chdb.query(\"SELECT 1 as id, 'Hello chDB!' as message, version() as chdb_version\", \"DataFrame\")\nprint(\"Query Result (DataFrame):\\n\", result_df)\n\n# Query an existing Pandas DataFrame directly\ndata = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}\nmypandas_df = pd.DataFrame(data)\nsql_on_df = \"SELECT col1, upper(col2) FROM python(mypandas_df) WHERE col1 > 1\"\nqueried_df_from_pandas = chdb.query(sql_on_df, \"DataFrame\")\nprint(\"\\nQuery Result from Pandas DataFrame (DataFrame):\\n\", queried_df_from_pandas)","lang":"python","description":"This quickstart demonstrates how to execute a basic SQL query using `chdb.query` and receive the results directly as a Pandas DataFrame. It also shows how to query an existing Pandas DataFrame using ClickHouse SQL syntax via the `python(df_name)` table function."},"warnings":[{"fix":"Monitor memory usage for complex queries. For extremely large datasets, consider pre-processing or using a full ClickHouse server. Optimize SQL queries to reduce memory footprint where possible.","message":"chDB is an in-process engine and shares memory with your application. Running complex queries that process large datasets (e.g., aggregating 10GB on an 8GB RAM machine) can lead to out-of-memory crashes for the entire Python process, unlike server-side databases that can spill to disk.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Design your application with chDB as a single-user, embedded analytical tool. Implement access control and resource management at the application layer if necessary, or opt for a full ClickHouse server for multi-tenant scenarios.","message":"chDB operates in a single process and lacks built-in authentication, multi-tenancy, or fine-grained resource limits per user. This makes it unsuitable for multi-user applications or highly concurrent environments where resource isolation and access control are critical.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be mindful of chained DataFrame operations. For large datasets, consider explicitly performing operations that avoid intermediate materialization or breaking down complex chains into optimized SQL queries where possible.","message":"When using the DataStore (Pandas-compatible API) with chained operations in chDB v4.x, each intermediate step can materialize a new DataFrame in memory. This can lead to higher memory consumption than anticipated for large datasets, potentially negating some performance benefits.","severity":"gotcha","affected_versions":"4.0.0 and later"},{"fix":"Review your code for any direct references to `chdb-core` components. Ensure your environment correctly resolves dependencies after upgrading to 4.1.0 or later.","message":"In version 4.1.0, the `chdb` package was decoupled from `chdb-core`. While this was primarily an architectural change for packaging, users who had deep integrations or relied on specific internal structures related to `chdb-core` might experience breaking changes.","severity":"breaking","affected_versions":"Prior to 4.1.0"},{"fix":"Upgrade to chDB version 4.1.0 or newer to benefit from the fix for exit-related crashes.","message":"Versions prior to 4.1.0 were known to experience crashes when exiting the Python process, particularly with persistent sessions.","severity":"gotcha","affected_versions":"Prior to 4.1.0"},{"fix":"Ensure you are using chDB version 4.1.4 or newer to avoid potential module import issues after package upgrades.","message":"An issue in versions prior to 4.1.4 could lead to a broken module after upgrading, due to a missing `chdb/__init__.py` file.","severity":"gotcha","affected_versions":"Prior to 4.1.4"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}