{"id":4978,"library":"livy","title":"Python Client for Apache Livy","description":"pylivy is a Python client for Apache Livy, an open-source REST interface for interacting with Spark. It enables easy remote code execution on a Spark cluster, supporting interactive and batch sessions. The current version is 0.8.0, released in January 2021, and its development cadence appears to be as-needed.","status":"active","version":"0.8.0","language":"en","source_language":"en","source_url":"https://github.com/acroz/pylivy","tags":["apache livy","spark","etl","data processing","remote execution","pyspark"],"install":[{"cmd":"pip install livy","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Used internally for making HTTP requests to the Livy server.","package":"requests","optional":false}],"imports":[{"note":"The PyPI package is named 'livy', not 'pylivy', despite the GitHub repository name.","wrong":"from pylivy import LivySession","symbol":"LivySession","correct":"from livy import LivySession"},{"symbol":"LivyBatch","correct":"from livy import LivyBatch"}],"quickstart":{"code":"import os\nfrom livy import LivySession\nfrom requests.auth import HTTPBasicAuth\n\n# Configure Livy server URL and optional authentication\nLIVY_URL = os.environ.get('LIVY_SERVER_URL', 'http://localhost:8998')\nLIVY_USERNAME = os.environ.get('LIVY_USERNAME', 'livy_user')\nLIVY_PASSWORD = os.environ.get('LIVY_PASSWORD', 'livy_password')\n\nauth = HTTPBasicAuth(LIVY_USERNAME, LIVY_PASSWORD) if LIVY_USERNAME else None\n\ntry:\n    with LivySession.create(LIVY_URL, auth=auth) as session:\n        print(f\"Livy session {session.id} created successfully.\")\n\n        # Run some Spark code on the remote cluster\n        session.run(\"df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])\")\n        session.run(\"filtered_df = df.filter(df.name == 'Bob')\")\n\n        # Retrieve the result (e.g., as a pandas DataFrame)\n        local_df = session.download(\"filtered_df\")\n        print(\"Downloaded DataFrame:\")\n        print(local_df)\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Ensure a Livy server is running and accessible at the specified URL.\")","lang":"python","description":"This quickstart demonstrates how to create an interactive Livy session, run PySpark code remotely, and download results. It uses environment variables for the Livy server URL and authentication credentials for security."},"warnings":[{"fix":"Use `pip install livy` and `from livy import LivySession`.","message":"The Python package name on PyPI is `livy`, but the GitHub repository and project are often referred to as `pylivy`. Ensure you use `pip install livy` for installation and `from livy import ...` for imports.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use `LivySession.create(url, ...)` instead of `LivySession(url, ...)`.","message":"The `LivySession.create()` method is the recommended way to initialize a session, rather than directly instantiating `LivySession()`. While older documentation or examples might show direct instantiation, `create()` handles session setup and waiting for readiness more robustly.","severity":"gotcha","affected_versions":"0.7.0+"},{"fix":"For large datasets, use Spark's capabilities to write results to distributed storage instead of `session.download()`.","message":"When using `session.download()` to retrieve DataFrames, be aware that the entire DataFrame is collected and transferred to the client. This can lead to out-of-memory issues or slow performance for very large datasets. Consider processing large datasets on Spark and writing results to a shared storage (e.g., S3, HDFS) for efficient access.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade Python to version 3.6 or newer.","message":"Python 3.6 or later is required. Earlier Python versions are not supported.","severity":"gotcha","affected_versions":"<3.6"},{"fix":"Implement HTTPS for Livy server. Pass `auth` or `requests_session` parameters to `LivySession.create()` or `LivyBatch.create()` for secure authentication.","message":"For production environments, always secure your Apache Livy server with HTTPS and configure proper authentication. The `pylivy` client supports passing `requests` compatible Auth objects (e.g., `HTTPBasicAuth`) or custom `requests.Session` objects for secure communication.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}