{"id":7125,"library":"dagster-gcp-pandas","title":"Dagster GCP Pandas","description":"The `dagster-gcp-pandas` library provides an I/O manager for persisting Pandas DataFrames to Google Cloud Storage (GCS) within Dagster assets. It leverages `pandas` and `gcsfs` for efficient data serialization (defaulting to Parquet). This library is part of the `dagster` ecosystem and its versioning is tightly coupled with the core `dagster` library.","status":"active","version":"0.29.0","language":"en","source_language":"en","source_url":"https://github.com/dagster-io/dagster/tree/master/python_modules/libraries/dagster-gcp-pandas","tags":["dagster","gcp","pandas","io-manager","storage","data-engineering","cloud"],"install":[{"cmd":"pip install dagster-gcp-pandas","lang":"bash","label":"Install dagster-gcp-pandas"}],"dependencies":[{"reason":"Core Dagster framework, required for defining assets and resources.","package":"dagster"},{"reason":"Provides base GCS resource functionality which dagster-gcp-pandas builds upon.","package":"dagster-gcp"},{"reason":"The core data structure (DataFrame) that this library manages.","package":"pandas"},{"reason":"Python client for Google Cloud Storage, often required for underlying GCS interactions and authentication.","package":"google-cloud-storage","optional":true}],"imports":[{"symbol":"GCSPandasIOManager","correct":"from dagster_gcp_pandas import GCSPandasIOManager"},{"note":"While GCSPandasIOManager uses GCSResource internally, you might need to import GCSResource directly for custom GCS client configurations or other GCS-related operations within Dagster.","symbol":"GCSResource","correct":"from dagster_gcp.gcs import GCSResource"}],"quickstart":{"code":"import pandas as pd\nfrom dagster import asset, Definitions\nfrom dagster_gcp_pandas import GCSPandasIOManager\nimport os\n\n@asset\ndef my_pandas_dataframe_asset() -> pd.DataFrame:\n    \"\"\"Produces a Pandas DataFrame.\"\"\"\n    return pd.DataFrame({\"value\": [1, 2, 3], \"label\": [\"A\", \"B\", \"C\"]})\n\n# Configure the GCSPandasIOManager to store DataFrames in a specified GCS bucket.\n# Ensure the GCS_BUCKET_NAME environment variable is set or replace \"your-gcs-bucket-name\".\n# You also need appropriate GCP credentials configured (e.g., GOOGLE_APPLICATION_CREDENTIALS).\ngcs_io_manager = GCSPandasIOManager(\n    gcs_bucket=os.environ.get(\"GCS_BUCKET_NAME\", \"your-gcs-bucket-name\"),\n    gcs_prefix=\"dagster_assets/pandas\"\n)\n\ndefs = Definitions(\n    assets=[my_pandas_dataframe_asset],\n    resources={\n        \"io_manager\": gcs_io_manager\n    }\n)\n\n# To run:\n# 1. Save this code as a Python file (e.g., my_project/repo.py)\n# 2. Set the GCS_BUCKET_NAME environment variable:\n#    export GCS_BUCKET_NAME=\"your-actual-bucket-name\"\n# 3. Ensure GCP credentials are set up (e.g., using `gcloud auth application-default login`\n#    or `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to a service account key).\n# 4. Execute: `dagster dev -f my_project/repo.py`\n# 5. Navigate to the Dagster UI (usually http://localhost:3000) and materialize the asset.","lang":"python","description":"This quickstart demonstrates how to define an asset that produces a Pandas DataFrame and uses `GCSPandasIOManager` to store it in a Google Cloud Storage bucket. It requires a GCS bucket to be configured and proper GCP authentication."},"warnings":[{"fix":"Always install `dagster-gcp-pandas` and `dagster` with matching major and minor versions (e.g., `pip install dagster==1.x.y dagster-gcp-pandas==0.x.y`). Refer to the Dagster release notes for the correct library version mapping.","message":"Dagster libraries, including `dagster-gcp-pandas`, are versioned in lockstep with the core `dagster` library. Installing mismatched versions (e.g., `dagster==1.0.0` with `dagster-gcp-pandas==0.15.0`) can lead to `ModuleNotFoundError` or other runtime errors.","severity":"breaking","affected_versions":"<1.0.0 (old library versions with new core)"},{"fix":"Ensure the environment where Dagster runs has access to GCP credentials (e.g., `GOOGLE_APPLICATION_CREDENTIALS` environment variable, default credentials for GCE instances, or `gcloud auth application-default login`). The service account/user needs `Storage Object Viewer` and `Storage Object Creator` (or `Storage Object Admin`) roles on the target GCS bucket.","message":"Proper Google Cloud Platform (GCP) authentication and permissions are required for `dagster-gcp-pandas` to interact with GCS. Lack of credentials or insufficient permissions will result in `PermissionDenied` errors.","severity":"gotcha","affected_versions":"All"},{"fix":"When initializing `GCSPandasIOManager`, set `file_extension` explicitly, e.g., `GCSPandasIOManager(gcs_bucket=\"my-bucket\", file_extension=\".csv\")`.","message":"The `GCSPandasIOManager` defaults to Parquet format for serialization. If you expect or require other formats like CSV, JSON, or feather, you must explicitly configure the `file_extension` parameter.","severity":"gotcha","affected_versions":"All"},{"fix":"Carefully double-check the `gcs_bucket` and `gcs_prefix` values in your `GCSPandasIOManager` configuration. Ensure the bucket name is correct and the prefix matches where the data is expected to be stored/retrieved.","message":"Misconfiguring the `gcs_bucket` or `gcs_prefix` parameters can lead to assets not being found when loading, or being written to unexpected locations within GCS.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install dagster-gcp-pandas` which should pull `dagster-gcp` as a dependency. If still missing, try `pip install dagster-gcp` explicitly.","cause":"The `dagster-gcp` library, which `dagster-gcp-pandas` depends on for core GCS functionality, is not installed.","error":"ModuleNotFoundError: No module named 'dagster_gcp'"},{"fix":"Grant the appropriate IAM roles (e.g., `Storage Object Viewer`, `Storage Object Creator`) to the service account or user for the GCS bucket. Verify that GCP credentials are correctly configured in the execution environment.","cause":"The authenticated GCP identity (service account or user) does not have the necessary permissions to read from or write to the specified GCS bucket.","error":"google.api_core.exceptions.PermissionDenied: 403 GET ... Insufficient Permission"},{"fix":"Ensure the asset has been materialized at least once. Verify that the `gcs_bucket` and `gcs_prefix` configured in your `GCSPandasIOManager` match the actual location where the asset was stored or is expected to be found.","cause":"The `GCSPandasIOManager` attempted to load an asset that does not exist at the specified GCS path, or the path is incorrect.","error":"FileNotFoundError: Could not find object at gs://your-gcs-bucket-name/dagster_assets/pandas/my_pandas_dataframe_asset"}]}