DuckDB in-process database

1.5.1 · active · verified Sun Mar 29

DuckDB is an in-process SQL OLAP (Online Analytical Processing) database management system designed for fast analytical queries directly within your Python application. It operates without a separate server, integrating seamlessly with the Python data ecosystem like Pandas and Polars. It is actively maintained with frequent releases, currently at version 1.5.1, and focuses on efficient data handling for large datasets.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to connect to an in-memory DuckDB database, execute SQL queries, insert data, and retrieve results as a Pandas DataFrame. It also shows the convenience of using the default global in-memory database directly via `duckdb.sql()` for quick operations. For persistent storage, specify a file path in `duckdb.connect()`.

import duckdb

# Connect to an in-memory database (data is lost after session)
con = duckdb.connect(database=':memory:')

# Execute a SQL query and show results
result = con.sql("SELECT 42 AS answer").show()

# Create a table and insert data
con.execute("CREATE TABLE my_table (id INTEGER, name VARCHAR)")
con.execute("INSERT INTO my_table VALUES (1, 'Alice'), (2, 'Bob')")

# Query the table and fetch results as a Pandas DataFrame
df_result = con.sql("SELECT * FROM my_table WHERE id = 1").df()
print(df_result)

# Example of using the default global in-memory database
df_global = duckdb.sql("SELECT 'Hello, DuckDB!' AS message").df()
print(df_global)

view raw JSON →