Python Client for Apache Livy

0.8.0 · active · verified Sun Apr 12

pylivy is a Python client for Apache Livy, an open-source REST interface for interacting with Spark. It enables easy remote code execution on a Spark cluster, supporting interactive and batch sessions. The current version is 0.8.0, released in January 2021, and its development cadence appears to be as-needed.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create an interactive Livy session, run PySpark code remotely, and download results. It uses environment variables for the Livy server URL and authentication credentials for security.

import os
from livy import LivySession
from requests.auth import HTTPBasicAuth

# Configure Livy server URL and optional authentication
LIVY_URL = os.environ.get('LIVY_SERVER_URL', 'http://localhost:8998')
LIVY_USERNAME = os.environ.get('LIVY_USERNAME', 'livy_user')
LIVY_PASSWORD = os.environ.get('LIVY_PASSWORD', 'livy_password')

auth = HTTPBasicAuth(LIVY_USERNAME, LIVY_PASSWORD) if LIVY_USERNAME else None

try:
    with LivySession.create(LIVY_URL, auth=auth) as session:
        print(f"Livy session {session.id} created successfully.")

        # Run some Spark code on the remote cluster
        session.run("df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])")
        session.run("filtered_df = df.filter(df.name == 'Bob')")

        # Retrieve the result (e.g., as a pandas DataFrame)
        local_df = session.download("filtered_df")
        print("Downloaded DataFrame:")
        print(local_df)

except Exception as e:
    print(f"An error occurred: {e}")
    print("Ensure a Livy server is running and accessible at the specified URL.")

view raw JSON →