PyStarburst

0.11.0 · active · verified Thu Apr 16

PyStarburst provides a Python DataFrame API for querying and transforming data directly within Starburst Galaxy and Starburst Enterprise Platform (SEP) clusters. It enables data engineers and developers to build complex transformation pipelines and data applications using familiar Python syntax without needing to download data locally. The library is actively maintained, with version 0.11.0 released in February 2026, and a release cadence of approximately every few months.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to establish a connection to a Starburst cluster using PyStarburst and execute a basic SQL query. It uses environment variables for sensitive connection parameters. You will need to replace placeholder values with your actual Starburst Galaxy or SEP cluster details.

import os
from pystarburst import Session
from trino.auth import BasicAuthentication

# Replace with your Starburst cluster details from Partner Connect
host = os.environ.get('STARBURST_HOST', 'your-starburst-host.trino.galaxy.starburst.io')
port = int(os.environ.get('STARBURST_PORT', '443'))
user = os.environ.get('STARBURST_USER', 'your-user@example.com')
password = os.environ.get('STARBURST_PASSWORD', 'your_password')
catalog = os.environ.get('STARBURST_CATALOG', 'sample') # e.g., 'hive', 'iceberg'
schema = os.environ.get('STARBURST_SCHEMA', 'burstbank') # e.g., 'default'

db_parameters = {
    "host": host,
    "port": port,
    "http_scheme": "https",
    "catalog": catalog,
    "schema": schema,
    "auth": BasicAuthentication(user, password)
}

try:
    session = Session.builder.configs(db_parameters).create()
    print("Successfully connected to Starburst!")

    # Example: Querying a table
    df = session.sql("SELECT * FROM system.runtime.nodes").show()
    print("Query executed successfully.")

    # Example: Creating a DataFrame and applying a simple transformation
    # df_nation = session.table("nation") # Assuming 'nation' table exists in 'sample.burstbank'
    # df_filtered = df_nation.filter(df_nation.col("regionkey") == 0)
    # df_filtered.show()

finally:
    if 'session' in locals() and session:
        session.close()
        print("Session closed.")

view raw JSON →