Python Client for YTsaurus
ytsaurus-client is the official Python client library for YTsaurus, a scalable and fault-tolerant open-source big data platform for distributed storage and processing. It provides a Python-friendly mechanism for running operations, reading/writing data to the cluster, and interacting with distributed file systems, MapReduce, and NoSQL key-value storage. The library is actively maintained with frequent releases, currently at version 0.13.48.
Warnings
- breaking Loading snapshots containing 'list nodes' can cause master-server crashes. This is a server-side breaking change, but can impact client operations.
- breaking Tablet resource accounting changed default behavior from per-account to per-bundle for newly deployed clusters. This is a server-side change that may affect resource management and billing if not accounted for.
- gotcha YSON C++ bindings (`ytsaurus-yson`) are not supported on Windows and require Rosetta 2 emulation for Apple M1/M2 platforms.
- gotcha Mixing installation methods for `ytsaurus-client` and `ytsaurus-yson` (e.g., pip and system packages) can lead to hard-to-diagnose problems.
- gotcha When using dynamic tables, there are strict limits on value size (16 MB per cell), row length (128-512 MB for an entire row across versions), number of values per row (1024), and rows per query (e.g., 100,000 for inserts, 1 million for selects).
- gotcha Small chunks (under 100 MB, ideally aiming for 512 MB average) in static tables can significantly increase master server load and slow down data reads.
Install
-
pip install ytsaurus-client -
pip install ytsaurus-yson -
pip install ytsaurus-client-yc-auth
Imports
- YtClient
import yt client = yt.YtClient(...)
- yt.wrapper
import yt.wrapper as ytw
Quickstart
import os
import yt
from yt.common import YtError
# Configure connection via environment variables for a runnable example
# In a real scenario, these would be set in your environment
# or passed explicitly in client config.
# Example: os.environ['YT_PROXY'] = 'your-yt-cluster-proxy'
# Example: os.environ['YT_TOKEN'] = os.environ.get('YT_TOKEN', 'your-oauth-token')
# Fallback for demonstration if environment variables are not set
# Replace with actual proxy if running locally without env vars set.
# For example: config=yt.config.Config(proxy='localhost:8000', token='your-token')
client = yt.YtClient(config=yt.default_config.get_config_from_env())
try:
# Example: List the root directory of Cypress
# This requires 'read' permission on '//'
root_content = client.list("//", attributes=["type"]) # Get type attribute
print(f"First 5 items in root directory: {[item.attributes.get('type', 'unknown') + ' ' + str(item) for item in root_content[:5]]}")
# Example: Get an attribute of a system node
node_type = client.get("//sys/@type")
print(f"Type of //sys: {node_type}")
except YtError as e:
print(f"YTsaurus Error: {e.message}")
print("Please ensure YT_PROXY and YT_TOKEN environment variables are correctly set and you have access to the cluster.")
except Exception as e:
print(f"An unexpected error occurred: {e}")