{"id":8057,"library":"dagster-duckdb","title":"Dagster DuckDB Integration","description":"The `dagster-duckdb` library provides dedicated ops, resources, and IO managers for integrating DuckDB databases with Dagster data pipelines. It enables users to easily read from and write to DuckDB, manage database connections, and persist assets. The library's releases are tightly coupled with the Dagster core framework's major versions, ensuring compatibility and leveraging the latest features of both Dagster and DuckDB. Current version is 0.29.0.","status":"active","version":"0.29.0","language":"en","source_language":"en","source_url":"https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-duckdb","tags":["dagster","duckdb","etl","data-pipeline","data-orchestration","database","analytics"],"install":[{"cmd":"pip install dagster-duckdb","lang":"bash","label":"Install dagster-duckdb"}],"dependencies":[{"reason":"Core framework for orchestration. Must be compatible version.","package":"dagster","optional":false}],"imports":[{"note":"The resource was previously in an experimental module but is now a standard part of the `dagster_duckdb` package.","wrong":"from dagster.experimental import DuckDBResource","symbol":"DuckDBResource","correct":"from dagster_duckdb import DuckDBResource"},{"symbol":"duckdb_io_manager","correct":"from dagster_duckdb import duckdb_io_manager"}],"quickstart":{"code":"import os\nimport tempfile\nimport pandas as pd\nfrom dagster import Definitions, asset, ScheduleDefinition, file_relative_path\nfrom dagster_duckdb import DuckDBResource, duckdb_io_manager\n\n# Use a temporary file for the DuckDB database to make the example runnable.\n# In a production environment, this would typically be a persistent path.\ndb_temp_dir = tempfile.mkdtemp()\ndb_file_path = os.path.join(db_temp_dir, \"my_dagster_db.duckdb\")\n\n@asset\ndef my_duckdb_asset(duckdb: DuckDBResource):\n    \"\"\"\n    An asset that uses DuckDBResource to execute SQL directly,\n    creating and populating a table.\n    \"\"\"\n    with duckdb.get_connection() as conn:\n        conn.execute(\"CREATE TABLE IF NOT EXISTS my_data (id INTEGER, name TEXT)\")\n        conn.execute(\"INSERT INTO my_data VALUES (1, 'Alice'), (2, 'Bob')\")\n    print(f\"Table 'my_data' created and populated in {db_file_path}\")\n\n@asset(key=\"io_manager_output_table\")\ndef another_asset_for_io_manager() -> pd.DataFrame:\n    \"\"\"\n    An asset whose output (a Pandas DataFrame) is materialized by the\n    `duckdb_io_manager` into a DuckDB table named 'io_manager_output_table'.\n    \"\"\"\n    return pd.DataFrame({\"col_a\": [10, 20], \"col_b\": [\"x\", \"y\"]})\n\ndefs = Definitions(\n    assets=[\n        my_duckdb_asset,\n        another_asset_for_io_manager\n    ],\n    resources={\n        \"duckdb\": DuckDBResource(database=db_file_path),\n        \"io_manager\": duckdb_io_manager.configured({\"database\": db_file_path})\n    },\n    schedules=[\n        ScheduleDefinition(\n            job=my_duckdb_asset.to_job(name=\"my_duckdb_job\"),\n            cron_schedule=\"0 0 * * *\", # daily at midnight\n        )\n    ]\n)\n\n# To run this example:\n# 1. Save this code to a file (e.g., `my_repo.py`).\n# 2. Run `dagster dev -f my_repo.py` in your terminal.\n# 3. Navigate to the Dagster UI (typically http://localhost:3000).\n# 4. Launch a run for `my_duckdb_job` or `another_asset_for_io_manager` asset.\n# 5. After running, you can inspect the DuckDB file at `db_file_path`.","lang":"python","description":"This quickstart demonstrates how to define a Dagster repository with DuckDB integration. It includes an asset using `DuckDBResource` for direct SQL execution and another asset whose output (a Pandas DataFrame) is automatically materialized into a DuckDB table by the `duckdb_io_manager`. A temporary file is used for the database to make it easily runnable without manual cleanup. To execute, save this code, then run `dagster dev -f your_file.py` and trigger a run from the Dagster UI."},"warnings":[{"fix":"Always ensure your `dagster` and `dagster-duckdb` package versions align with the recommended compatibility. Check the Dagster release notes for specific version pairings.","message":"Dagster library versions (like `dagster-duckdb`) are tightly coupled with the core `dagster` package version. For example, `dagster-duckdb==0.29.0` is designed to work with `dagster==1.13.0`. Using mismatched versions can lead to unexpected behavior or runtime errors.","severity":"breaking","affected_versions":"<0.29.0"},{"fix":"Ensure the `database` path points to a valid and accessible file path. For in-memory databases, use `:memory:`. Consider using a full path with `os.path.join` and ensuring the directory exists.","message":"The `database` configuration for both `DuckDBResource` and `duckdb_io_manager` is crucial. Incorrectly specifying the path (e.g., a non-existent directory or insufficient permissions) will cause runtime errors when Dagster tries to connect or write to the database.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure assets that use `duckdb_io_manager` for materialization return a compatible data structure (e.g., `pandas.DataFrame`). Consult the documentation for supported types.","message":"The `duckdb_io_manager` expects assets to return data structures it knows how to serialize into a DuckDB table (e.g., Pandas DataFrames, Polars DataFrames, PyArrow Tables). Returning arbitrary Python objects will result in an error or unexpected serialization.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure your Python environment is within the supported range (Python 3.10, 3.11, 3.12, 3.13, 3.14). Consider using a virtual environment (e.g., `venv` or `conda`) to manage Python versions.","message":"The `requires_python` range for `dagster-duckdb==0.29.0` is `>=3.10, <3.15`. Using Python versions outside this range (e.g., Python 3.9 or 3.15+) may lead to installation failures or runtime incompatibilities.","severity":"gotcha","affected_versions":"0.29.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install dagster-duckdb` to install the library.","cause":"The `dagster-duckdb` package has not been installed in the active Python environment.","error":"ModuleNotFoundError: No module named 'dagster_duckdb'"},{"fix":"Provide a `database` path string (e.g., `DuckDBResource(database='path/to/my_db.duckdb')`) or `:memory:` for an in-memory database within its configuration.","cause":"The `DuckDBResource` or `duckdb_io_manager` was configured without specifying the `database` path.","error":"dagster._core.errors.DagsterInvalidConfigError: Missing required config field 'database'"},{"fix":"Ensure the `duckdb` resource is defined in your `Definitions` object and passed to the job containing the asset, for example: `Definitions(assets=[my_asset], resources={'duckdb': DuckDBResource(...)})`.","cause":"An asset or op tried to use a `DuckDBResource` (e.g., `@asset(resource_defs={'duckdb': ...})` or typed dependency), but the resource was not included in the `Definitions` object or job definition.","error":"dagster._core.errors.DagsterInvalidDefinitionError: Asset 'my_asset' requires resource 'duckdb', but it was not provided to the job."}]}