{"id":2038,"library":"fugue","title":"Fugue: Abstraction Layer for Distributed Computing","description":"Fugue is a Python library that provides a unified interface for defining data workflows, allowing them to run seamlessly on Pandas, Spark, Dask, Ray, and other distributed computing engines without code changes. It's designed to make data pipelines more portable and testable. The current version is 0.9.7, and it maintains an active release cadence with frequent patches and updates.","status":"active","version":"0.9.7","language":"en","source_language":"en","source_url":"https://github.com/fugue-project/fugue","tags":["distributed computing","data transformation","workflow","pandas","spark","ray","dask","etl"],"install":[{"cmd":"pip install fugue","lang":"bash","label":"Base installation"},{"cmd":"pip install 'fugue[spark]'","lang":"bash","label":"For Spark integration"},{"cmd":"pip install 'fugue[dask]'","lang":"bash","label":"For Dask integration"},{"cmd":"pip install 'fugue[ray]'","lang":"bash","label":"For Ray integration"},{"cmd":"pip install 'fugue[sql]'","lang":"bash","label":"For FugueSQL features"}],"dependencies":[{"reason":"Core data structure compatibility; specific versions may cause issues.","package":"pandas","optional":false},{"reason":"Required for Spark backend functionality.","package":"pyspark","optional":true},{"reason":"Required for Dask backend functionality.","package":"dask","optional":true},{"reason":"Required for Ray backend functionality.","package":"ray","optional":true}],"imports":[{"symbol":"FugueWorkflow","correct":"from fugue import FugueWorkflow"},{"symbol":"fugue_transform","correct":"from fugue.api import fugue_transform"},{"note":"The `DataFrame` class moved to `fugue.collections.dataframe` in newer versions.","wrong":"from fugue.dataframe import DataFrame","symbol":"DataFrame","correct":"from fugue.collections.dataframe import DataFrame"}],"quickstart":{"code":"from fugue import FugueWorkflow\nimport pandas as pd\n\ndef map_to_string(df: pd.DataFrame) -> pd.DataFrame:\n    return df.assign(value_str=df['value'].astype(str))\n\nwith FugueWorkflow() as dag:\n    df = dag.df([{\"id\": 1, \"value\": 10}, {\"id\": 2, \"value\": 20}])\n    result = df.transform(map_to_string, schema=\"*,value_str:str\")\n    result.show()","lang":"python","description":"This quickstart demonstrates defining a simple Fugue workflow. It creates a DataFrame, applies a transformation function (`map_to_string`) that converts a numeric column to a string, and then executes the workflow, printing the result to the console. The `schema` parameter in `transform` is crucial for explicit schema definition."},"warnings":[{"fix":"Ensure `fugue[sql]` is installed: `pip install 'fugue[sql]'`.","message":"Fugue SQL and related functions/dependencies were moved to an optional `[sql]` extra package in version 0.9.0. Users relying on these features without explicit installation will encounter `ImportError`.","severity":"breaking","affected_versions":">=0.9.0"},{"fix":"Upgrade your Python environment to 3.10 or newer.","message":"Fugue has strict Python version requirements. From 0.9.0 onwards, it generally requires Python 3.10 or higher. Using older Python versions will lead to installation and runtime errors.","severity":"gotcha","affected_versions":">=0.9.0"},{"fix":"Always install Fugue with its recommended extras (e.g., `pip install fugue[spark]`) to ensure compatible dependency versions are pulled. If issues persist, try upgrading/downgrading Pandas to align with Fugue's tested range or refer to Fugue's release notes for specific constraints.","message":"There have been several patches related to Pandas compatibility (e.g., specific version pinning for Pandas<3, then unpinning). Users might face issues if their Pandas version is incompatible with the installed Fugue version, especially when using Pandas 2.x.","severity":"gotcha","affected_versions":"0.9.4 - 0.9.5 (and potentially others)"},{"fix":"Update imports from `from fugue.dataframe import DataFrame` to `from fugue.collections.dataframe import DataFrame`.","message":"The `DataFrame` class was moved from `fugue.dataframe` to `fugue.collections.dataframe`. While the old path might still work for some time due to redirects, it's considered deprecated.","severity":"deprecated","affected_versions":">=0.9.0"},{"fix":"Keep Fugue updated to the latest patch version alongside your distributed engine. If specific compatibility issues arise, check the Fugue GitHub issues or documentation for known conflicts.","message":"Compatibility with distributed engines (Spark, Ray, Dask) is continuously updated. Older Fugue versions might not work with the latest versions of these engines, and vice-versa. Users should monitor Fugue release notes for compatibility fixes.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}