{"id":5073,"library":"substrait","title":"Substrait Python Bindings","description":"The `substrait` Python package provides an interface for interacting with the Substrait specification, a cross-language intermediate representation for data compute operations. It allows users to construct, manipulate, and serialize/deserialize Substrait Plans in Python. It is explicitly stated as an experimental package that is still under active development, not an execution engine. The current version is 0.29.0, released in March 2026, with the core Substrait specification having a frequent release cadence that includes breaking changes.","status":"active","version":"0.29.0","language":"en","source_language":"en","source_url":"https://github.com/substrait-io/substrait-python","tags":["data transformation","relational algebra","intermediate representation","specification","protobuf","data processing","query planning"],"install":[{"cmd":"pip install substrait","lang":"bash","label":"PyPI"},{"cmd":"conda install -c conda-forge python-substrait","lang":"bash","label":"Conda-Forge"}],"dependencies":[{"reason":"Runtime environment requirement","package":"python","optional":false}],"imports":[{"note":"The primary module for accessing Substrait Plan classes.","symbol":"proto","correct":"from substrait import proto"}],"quickstart":{"code":"from substrait import proto\n\n# Example: Create a simple Substrait Plan equivalent to SELECT first_name FROM person\nplan = proto.Plan(\n    relations=[\n        proto.PlanRel(\n            root=proto.RelRoot(\n                names=[\"first_name\"],\n                input=proto.Rel(\n                    read=proto.ReadRel(\n                        named_table=proto.ReadRel.NamedTable(names=[\"people\"]),\n                        base_schema=proto.NamedStruct(\n                            names=[\"first_name\", \"surname\"],\n                            struct=proto.Type.Struct(\n                                types=[\n                                    proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED)),\n                                    proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED))\n                                ]\n                            )\n                        )\n                    )\n                )\n            )\n        )\n    ]\n)\n\nprint(plan)\nserialized_plan = plan.SerializeToString()\nprint(f\"Serialized plan length: {len(serialized_plan)} bytes\")\n\n# To consume a plan from bytes:\n# loaded_plan = proto.Plan()\n# loaded_plan.ParseFromString(serialized_plan)\n# print(loaded_plan)\n\n# To load a plan from JSON (assuming you have a JSON string 'json_plan_str'):\n# from google.protobuf import json_format\n# json_plan_str = \"{... your JSON plan ...}\"\n# loaded_plan_from_json = json_format.Parse(json_plan_str, proto.Plan())\n# print(loaded_plan_from_json)","lang":"python","description":"This example demonstrates how to programmatically construct a simple Substrait Plan using the `substrait.proto` module, serialize it to bytes, and shows how to conceptually load plans from bytes or JSON strings. This plan represents a 'SELECT first_name FROM people' query with a defined schema for the 'people' table."},"warnings":[{"fix":"Always pin to exact versions (`substrait==X.Y.Z`) in production environments and regularly review release notes for updates. Be prepared for breaking changes.","message":"The `substrait` Python package is explicitly marked as 'experimental' and 'still under development'. This means its API and behavior may change frequently without adhering strictly to semantic versioning for minor releases, potentially causing unexpected breakages.","severity":"gotcha","affected_versions":"All 0.x.x versions"},{"fix":"Do not expect `substrait` to execute queries directly. Integrate it with a compatible Substrait consumer for execution.","message":"This library is *not* an execution engine for Substrait plans. Its primary purpose is to provide a Python interface for *producing* and *consuming* Substrait plans, which are then meant to be executed by external Substrait-compliant data compute engines (e.g., DataFusion, DuckDB).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Monitor the official Substrait specification's breaking change policy and changelogs, as well as the `substrait-python` release notes. Update dependencies and code to align with the latest spec and library versions.","message":"The underlying Substrait specification itself undergoes breaking changes, and the Python bindings are tightly coupled to this specification. For example, a significant URI to URN migration occurred in 2025 across the Substrait ecosystem. Such changes in the spec will lead to corresponding breaking changes in the Python library.","severity":"breaking","affected_versions":"Potentially all 0.x.x versions, specifically changes around 2025 for URI to URN migration."},{"fix":"Ensure that all components in your Substrait pipeline (Python library, external producers/consumers, validators) are compatible with a consistent Substrait specification version. Refer to the `substrait-validator` documentation for version compatibility matrices if using the validator.","message":"Compatibility with other Substrait tools (producers, consumers, validators) can be complex due to the evolving nature of the Substrait specification. Different versions of consumers or validators may only support specific ranges of the Substrait spec, which the Python library reflects.","severity":"gotcha","affected_versions":"All 0.x.x versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}