Ibis: The Portable Python Dataframe Library

12.0.0 · active · verified Sat Apr 11

Ibis is a portable Python dataframe library that provides a Pythonic way to build and execute operations on data in various backends, including SQL databases, data warehouses, and data lakes. It offers a familiar dataframe API that compiles into the backend's native language, enabling local iteration and remote deployment by changing a single line of code. It is currently at version 12.0.0 and maintains an active release cadence with frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to connect to an in-memory DuckDB backend, load an example dataset, define a lazy dataframe expression to calculate the average bill length per penguin species, and then execute it to retrieve the results as a pandas DataFrame. It showcases Ibis's core lazy evaluation pattern.

import ibis

# Connect to an in-memory DuckDB database (default backend)
con = ibis.duckdb.connect(':memory:')

# Load example data (e.g., the 'penguins' dataset)
t = ibis.examples.penguins.fetch()

# Create a table in the connected database
con.create_table('penguins', t.to_pyarrow(), overwrite=True)

# Get a table expression from the connection
table = con.table('penguins')

# Perform a lazy computation: group by species and calculate mean bill length
result_expr = table.group_by('species').agg(avg_bill_length=table.bill_length_mm.mean())

# Execute the expression and fetch results into a pandas DataFrame
df = result_expr.to_pandas()

print(df)

view raw JSON →