{"id":4867,"library":"ytsaurus-client","title":"Python Client for YTsaurus","description":"ytsaurus-client is the official Python client library for YTsaurus, a scalable and fault-tolerant open-source big data platform for distributed storage and processing. It provides a Python-friendly mechanism for running operations, reading/writing data to the cluster, and interacting with distributed file systems, MapReduce, and NoSQL key-value storage. The library is actively maintained with frequent releases, currently at version 0.13.48.","status":"active","version":"0.13.48","language":"en","source_language":"en","source_url":"https://github.com/ytsaurus/ytsaurus","tags":["data-processing","distributed-system","big-data","mapreduce","client-library","cloud"],"install":[{"cmd":"pip install ytsaurus-client","lang":"bash","label":"Core client library"},{"cmd":"pip install ytsaurus-yson","lang":"bash","label":"Optional YSON bindings (C++ for performance)"},{"cmd":"pip install ytsaurus-client-yc-auth","lang":"bash","label":"Optional for Yandex Cloud authentication"}],"dependencies":[{"reason":"Provides C++ bindings for YSON format for improved performance; optional and platform-dependent.","package":"ytsaurus-yson","optional":true},{"reason":"Required for authentication when connecting to YTsaurus clusters in Yandex Cloud.","package":"ytsaurus-client-yc-auth","optional":true}],"imports":[{"note":"While `yt.wrapper` provides global functions, `yt.YtClient` is the recommended class for creating independent client instances and explicit configuration.","wrong":"from yt.wrapper import YtClient","symbol":"YtClient","correct":"import yt\nclient = yt.YtClient(...)"},{"note":"It is generally better to import `yt` and access `wrapper` or `YtClient` via `yt.` to avoid name collisions and maintain clarity, especially if using `yt.YtClient`.","wrong":"from yt import wrapper","symbol":"yt.wrapper","correct":"import yt.wrapper as ytw"}],"quickstart":{"code":"import os\nimport yt\nfrom yt.common import YtError\n\n# Configure connection via environment variables for a runnable example\n# In a real scenario, these would be set in your environment\n# or passed explicitly in client config.\n# Example: os.environ['YT_PROXY'] = 'your-yt-cluster-proxy'\n# Example: os.environ['YT_TOKEN'] = os.environ.get('YT_TOKEN', 'your-oauth-token')\n\n# Fallback for demonstration if environment variables are not set\n# Replace with actual proxy if running locally without env vars set.\n# For example: config=yt.config.Config(proxy='localhost:8000', token='your-token')\nclient = yt.YtClient(config=yt.default_config.get_config_from_env())\n\ntry:\n    # Example: List the root directory of Cypress\n    # This requires 'read' permission on '//'\n    root_content = client.list(\"//\", attributes=[\"type\"]) # Get type attribute\n    print(f\"First 5 items in root directory: {[item.attributes.get('type', 'unknown') + ' ' + str(item) for item in root_content[:5]]}\")\n\n    # Example: Get an attribute of a system node\n    node_type = client.get(\"//sys/@type\")\n    print(f\"Type of //sys: {node_type}\")\n\nexcept YtError as e:\n    print(f\"YTsaurus Error: {e.message}\")\n    print(\"Please ensure YT_PROXY and YT_TOKEN environment variables are correctly set and you have access to the cluster.\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")","lang":"python","description":"This quickstart demonstrates how to initialize the YTsaurus client and perform basic operations like listing a directory and getting a node's attribute. It assumes `YT_PROXY` and `YT_TOKEN` environment variables are set for authentication and cluster proxy address. For a local setup, you might need `ytsaurus-local` and direct configuration."},"warnings":[{"fix":"Ensure YTsaurus cluster administrators have applied the recommended mitigation by disabling `alert_on_list_node_load` in the dynamic configuration. List nodes are slated for complete removal in future major versions.","message":"Loading snapshots containing 'list nodes' can cause master-server crashes. This is a server-side breaking change, but can impact client operations.","severity":"breaking","affected_versions":"YTsaurus Server versions 25.3.0 and later. Mitigation: disable `alert_on_list_node_load` in dynamic config."},{"fix":"Review your resource accounting and capacity planning, especially if you manage multiple tablet bundles or accounts on new YTsaurus deployments. Existing clusters are not affected by default.","message":"Tablet resource accounting changed default behavior from per-account to per-bundle for newly deployed clusters. This is a server-side change that may affect resource management and billing if not accounted for.","severity":"breaking","affected_versions":"YTsaurus Server versions 25.3.0 and later (only for newly deployed clusters)."},{"fix":"On Windows, YSON will be handled by the slower pure-Python implementation. On Apple Silicon, use Rosetta 2 to run an x86_64 Python environment or be prepared for pure-Python YSON processing. Performance-critical workloads might require Linux environments.","message":"YSON C++ bindings (`ytsaurus-yson`) are not supported on Windows and require Rosetta 2 emulation for Apple M1/M2 platforms.","severity":"gotcha","affected_versions":"All versions on Windows and Apple Silicon without Rosetta 2."},{"fix":"Always install both `ytsaurus-client` and `ytsaurus-yson` using a consistent method, preferably `pip` within a virtual environment. Avoid using `sudo pip install` if possible.","message":"Mixing installation methods for `ytsaurus-client` and `ytsaurus-yson` (e.g., pip and system packages) can lead to hard-to-diagnose problems.","severity":"gotcha","affected_versions":"All versions."},{"fix":"Design your data schema and queries to respect these limits. For frequent delta writes in aggregation columns, consider setting `@merge_rows_on_flush=%true` and configuring TTL deletion to manage versions efficiently.","message":"When using dynamic tables, there are strict limits on value size (16 MB per cell), row length (128-512 MB for an entire row across versions), number of values per row (1024), and rows per query (e.g., 100,000 for inserts, 1 million for selects).","severity":"gotcha","affected_versions":"All versions."},{"fix":"For tables with many small chunks, use `yt merge --src //your/table --dst //your/table --spec '{combine_chunks=true;mode=<mode>}'` to increase chunk size. In Python, `auto_merge_output={action=merge}` can be specified in configuration to automatically aggregate resulting tables.","message":"Small chunks (under 100 MB, ideally aiming for 512 MB average) in static tables can significantly increase master server load and slow down data reads.","severity":"gotcha","affected_versions":"All versions."}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}