Hadoop YARN API Client
A Python client for interacting with the Hadoop® YARN API. It provides programmatic access to YARN components like ResourceManager, ApplicationMaster, HistoryServer, and NodeManager. The library is actively maintained, with a focus on supporting recent Python and Hadoop YARN versions, and has a steady release cadence for minor improvements and bug fixes.
Warnings
- breaking Python 2.7 support was officially dropped in version 1.0.3. Code written for Python 2.7 will likely fail with syntax errors or missing features.
- breaking Version 1.0.0 introduced a major API cleanup. The `ResourceManager`, `ApplicationMaster`, `HistoryServer`, and `NodeManager` constructors no longer accept separate `address` and `port` parameters. Instead, they require complete endpoint URLs (e.g., `['http://localhost:8088']`). `ResourceManager` also now accepts a list of endpoints for HA support.
- gotcha When using YARN in High Availability (HA) mode, ensure you provide a list of all active ResourceManager endpoints to the `ResourceManager` constructor. The client will attempt to connect to the active RM from the provided list.
- gotcha The library can automatically discover Hadoop configuration by checking `YARN_CONF_DIR` or `HADOOP_CONF_DIR` environment variables. If these are set, explicit endpoints provided in the constructor might be overridden or interact unexpectedly with discovered configurations.
- gotcha Older YARN deployments or certain API calls might return empty JSON responses, which could cause parsing errors in the client. Version 1.0.2 improved handling of such cases.
Install
-
pip install yarn-api-client
Imports
- ResourceManager
from yarn_api_client.resource_manager import ResourceManager
- ApplicationMaster
from yarn_api_client.application_master import ApplicationMaster
- HistoryServer
from yarn_api_client.history_server import HistoryServer
- NodeManager
from yarn_api_client.node_manager import NodeManager
Quickstart
import os
from yarn_api_client.resource_manager import ResourceManager
# Configure YARN ResourceManager endpoints
# These can also be discovered automatically if YARN_CONF_DIR or HADOOP_CONF_DIR
# environment variables are set and point to valid Hadoop configuration.
# For local testing, ensure a YARN ResourceManager is running or adjust endpoint.
# Using a dummy endpoint if not set, for demonstration purposes.
# In a real scenario, replace with your actual YARN ResourceManager URL(s).
# Example for HA: ['http://rm1.example.com:8088', 'http://rm2.example.com:8088']
rm_endpoints = os.environ.get('YARN_RM_ENDPOINTS', 'http://localhost:8088').split(',')
if not rm_endpoints or rm_endpoints == ['']:
print("Warning: YARN_RM_ENDPOINTS environment variable not set. Using 'http://localhost:8088' as default.")
rm_endpoints = ['http://localhost:8088']
print(f"Attempting to connect to YARN ResourceManager at: {rm_endpoints}")
try:
resource_manager = ResourceManager(rm_endpoints)
# Fetch cluster applications
applications_response = resource_manager.cluster_applications()
if applications_response.apps:
print(f"Found {len(applications_response.apps)} applications.")
for app in applications_response.apps[:3]: # Print first 3 apps
print(f" Application ID: {app.id}, Name: {app.name}, State: {app.state}")
else:
print("No applications found on the YARN cluster.")
except Exception as e:
print(f"Error connecting to YARN ResourceManager or fetching applications: {e}")
print("Please ensure a YARN ResourceManager is running and accessible at the configured endpoint(s).")
print("You can set the YARN_RM_ENDPOINTS environment variable, e.g., export YARN_RM_ENDPOINTS='http://your-rm-host:8088'")