dagster-duckdb-pandas

raw JSON →
0.29.3 verified Fri May 01 auth: no python

Dagster library for storing Pandas DataFrames in DuckDB. Current version 0.29.3, requires Python >=3.10,<3.15. Part of the Dagster ecosystem, releases follow Dagster core versions (libraries version 0.x). Cadence: monthly releases alongside Dagster core.

pip install dagster-duckdb-pandas
error ModuleNotFoundError: No module named 'dagster_duckdb_pandas'
cause dagster-duckdb-pandas not installed or installed in wrong environment.
fix
Run pip install dagster-duckdb-pandas.
error The `database` argument refer to a different database connection parameter. The `database` parameter expects a DuckDB database file path or `:memory:`.
cause Misunderstanding of the `database` parameter; users may try to pass a connection string or existing connection.
fix
Instantiate with DuckDBPandasIOManager(database='my_database.duckdb') or database=':memory:' for in-memory.
breaking Upgrading from dagster-duckdb 0.x to 0.23+ changes I/O manager import paths. Use `DuckDBPandasIOManager` from `dagster_duckdb_pandas` instead of legacy imports from `dagster_duckdb`.
fix Change import to `from dagster_duckdb_pandas import DuckDBPandasIOManager` and update resource configuration.
gotcha DuckDBPandasIOManager expects a `database` argument (string path) instead of `conn` or `connection`. Using a pre-existing connection object will fail.
fix Provide the database file path (e.g., `DuckDBPandasIOManager(database='path/to/db.duckdb')`).
deprecated The `mode` parameter (e.g., `'overwrite'`, `'append'`) has been deprecated in favor of per-asset metadata or explicit overwrite strategies.
fix Use asset metadata or custom I/O manager configurations instead of the `mode` parameter.

Initialize a DuckDB I/O manager that stores pandas DataFrames in a local DuckDB database.

from dagster_duckdb_pandas import DuckDBPandasIOManager
from dagster import Definitions, asset

@asset
def my_table() -> str:
    import pandas as pd
    return pd.DataFrame({"col1": [1, 2], "col2": ["a", "b"]})

resources = {"io_manager": DuckDBPandasIOManager(database="my_db.duckdb")}
defs = Definitions(assets=[my_table], resources=resources)