dagster-duckdb-pandas
raw JSON → 0.29.3 verified Fri May 01 auth: no python
Dagster library for storing Pandas DataFrames in DuckDB. Current version 0.29.3, requires Python >=3.10,<3.15. Part of the Dagster ecosystem, releases follow Dagster core versions (libraries version 0.x). Cadence: monthly releases alongside Dagster core.
pip install dagster-duckdb-pandas Common errors
error ModuleNotFoundError: No module named 'dagster_duckdb_pandas' ↓
cause dagster-duckdb-pandas not installed or installed in wrong environment.
fix
Run
pip install dagster-duckdb-pandas. error The `database` argument refer to a different database connection parameter. The `database` parameter expects a DuckDB database file path or `:memory:`. ↓
cause Misunderstanding of the `database` parameter; users may try to pass a connection string or existing connection.
fix
Instantiate with
DuckDBPandasIOManager(database='my_database.duckdb') or database=':memory:' for in-memory. Warnings
breaking Upgrading from dagster-duckdb 0.x to 0.23+ changes I/O manager import paths. Use `DuckDBPandasIOManager` from `dagster_duckdb_pandas` instead of legacy imports from `dagster_duckdb`. ↓
fix Change import to `from dagster_duckdb_pandas import DuckDBPandasIOManager` and update resource configuration.
gotcha DuckDBPandasIOManager expects a `database` argument (string path) instead of `conn` or `connection`. Using a pre-existing connection object will fail. ↓
fix Provide the database file path (e.g., `DuckDBPandasIOManager(database='path/to/db.duckdb')`).
deprecated The `mode` parameter (e.g., `'overwrite'`, `'append'`) has been deprecated in favor of per-asset metadata or explicit overwrite strategies. ↓
fix Use asset metadata or custom I/O manager configurations instead of the `mode` parameter.
Imports
- DuckDBPandasIOManager wrong
from dagster_duckdb import DuckDBIOManagercorrectfrom dagster_duckdb_pandas import DuckDBPandasIOManager
Quickstart
from dagster_duckdb_pandas import DuckDBPandasIOManager
from dagster import Definitions, asset
@asset
def my_table() -> str:
import pandas as pd
return pd.DataFrame({"col1": [1, 2], "col2": ["a", "b"]})
resources = {"io_manager": DuckDBPandasIOManager(database="my_db.duckdb")}
defs = Definitions(assets=[my_table], resources=resources)