DuckDB in-process database
DuckDB is an in-process SQL OLAP (Online Analytical Processing) database management system designed for fast analytical queries directly within your Python application. It operates without a separate server, integrating seamlessly with the Python data ecosystem like Pandas and Polars. It is actively maintained with frequent releases, currently at version 1.5.1, and focuses on efficient data handling for large datasets.
Warnings
- breaking Python 3.9 support has been dropped with DuckDB Python v1.5.0. Users on Python 3.9 will encounter errors.
- breaking The `duckdb.typing` and `duckdb.functional` modules were removed in v1.5.0, having been deprecated in v1.4.0.
- deprecated The methods `fetch_arrow_table()` and `fetch_record_batch()` on connections and relations have been deprecated.
- gotcha DuckDB's persistent storage format is not stable across major/minor versions prior to v1.0. Upgrading DuckDB can lead to `IOException` when trying to read older database files.
- gotcha The `column` parameter in relational API functions (e.g., `min`, `max`, `sum`) was renamed to `expression` to better reflect that it accepts expressions, not just column names.
- deprecated The lambda arrow syntax `x -> x + 1` in SQL queries is deprecated in v1.5.0 and will emit a warning.
Install
-
pip install duckdb
Imports
- duckdb
import duckdb
- duckdb.sqltypes
from duckdb import sqltypes
- duckdb.func
from duckdb import func
Quickstart
import duckdb
# Connect to an in-memory database (data is lost after session)
con = duckdb.connect(database=':memory:')
# Execute a SQL query and show results
result = con.sql("SELECT 42 AS answer").show()
# Create a table and insert data
con.execute("CREATE TABLE my_table (id INTEGER, name VARCHAR)")
con.execute("INSERT INTO my_table VALUES (1, 'Alice'), (2, 'Bob')")
# Query the table and fetch results as a Pandas DataFrame
df_result = con.sql("SELECT * FROM my_table WHERE id = 1").df()
print(df_result)
# Example of using the default global in-memory database
df_global = duckdb.sql("SELECT 'Hello, DuckDB!' AS message").df()
print(df_global)