Substrait Python Bindings

0.29.0 · active · verified Sun Apr 12

The `substrait` Python package provides an interface for interacting with the Substrait specification, a cross-language intermediate representation for data compute operations. It allows users to construct, manipulate, and serialize/deserialize Substrait Plans in Python. It is explicitly stated as an experimental package that is still under active development, not an execution engine. The current version is 0.29.0, released in March 2026, with the core Substrait specification having a frequent release cadence that includes breaking changes.

Warnings

Install

Imports

Quickstart

This example demonstrates how to programmatically construct a simple Substrait Plan using the `substrait.proto` module, serialize it to bytes, and shows how to conceptually load plans from bytes or JSON strings. This plan represents a 'SELECT first_name FROM people' query with a defined schema for the 'people' table.

from substrait import proto

# Example: Create a simple Substrait Plan equivalent to SELECT first_name FROM person
plan = proto.Plan(
    relations=[
        proto.PlanRel(
            root=proto.RelRoot(
                names=["first_name"],
                input=proto.Rel(
                    read=proto.ReadRel(
                        named_table=proto.ReadRel.NamedTable(names=["people"]),
                        base_schema=proto.NamedStruct(
                            names=["first_name", "surname"],
                            struct=proto.Type.Struct(
                                types=[
                                    proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED)),
                                    proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED))
                                ]
                            )
                        )
                    )
                )
            )
        )
    ]
)

print(plan)
serialized_plan = plan.SerializeToString()
print(f"Serialized plan length: {len(serialized_plan)} bytes")

# To consume a plan from bytes:
# loaded_plan = proto.Plan()
# loaded_plan.ParseFromString(serialized_plan)
# print(loaded_plan)

# To load a plan from JSON (assuming you have a JSON string 'json_plan_str'):
# from google.protobuf import json_format
# json_plan_str = "{... your JSON plan ...}"
# loaded_plan_from_json = json_format.Parse(json_plan_str, proto.Plan())
# print(loaded_plan_from_json)

view raw JSON →