Hugging Face Xet

raw JSON →
1.4.2 verified Tue May 12 auth: no python install: stale quickstart: stale

hf-xet is a foundational Python library that provides client technology for Xet storage, deeply integrated within the Hugging Face Hub ecosystem. It enables efficient, chunk-based deduplicated transfer of large files, such as machine learning models and datasets, to and from the Hugging Face Hub. This Rust-based library serves primarily as a backend for `huggingface_hub` and is not typically intended for direct user interaction. The project maintains an active development status, with releases often coinciding with or supporting major updates to the `huggingface_hub` library.

pip install hf-xet
error ERROR: Could not find a version that satisfies the requirement puccinialin (from hf-xet)
cause This error typically occurs when installing `hf-xet` with an outdated Python version, as `hf-xet` (via its dependency `puccinialin`) requires Python 3.9 or newer.
fix
Upgrade your Python environment to version 3.9 or higher and reinstall huggingface_hub and hf-xet in a new virtual environment: python3.10 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install "huggingface_hub[hf_xet]"
error Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
cause This is a warning, not an error, indicating that the `huggingface_hub` library can utilize `hf-xet` for optimized, deduplicated large file transfers but the `hf-xet` package is not present in your environment.
fix
Install the hf-xet package to enable optimized Xet storage functionality: pip install "huggingface_hub[hf_xet]"
error error: linker `cc` not found
cause This error occurs during the installation of `hf-xet` (a Rust-based library) if a C compiler (like `gcc` or 'clang') is not installed or not discoverable in your system's PATH, which is necessary for Rust to build Python extensions.
fix
Install a C compiler (e.g., build-essential on Debian/Ubuntu, Xcode Command Line Tools on macOS) and ensure it's accessible in your system's PATH. For Debian/Ubuntu: sudo apt-get update && sudo apt-get install build-essential.
gotcha hf-xet is an underlying dependency for `huggingface_hub` and is not designed for direct user interaction or import in typical Python code. All functionalities, including optimized large file transfers, are exposed through the `huggingface_hub` API.
fix Interact with Hugging Face Hub repositories via the `huggingface_hub` library (e.g., `huggingface_hub.snapshot_download`, `huggingface_hub.upload_file`). Ensure `huggingface_hub` version is >=0.32.0 for automatic `hf-xet` integration.
breaking Installing `hf-xet` via `pip install --no-binary hf-xet` or in non-standard environments (like Termux) could fail due to `maturin` (Rust-Python binding) build issues, particularly with older versions (e.g., 1.0.3). This might manifest as 'Failed to normalize python source path `python`' errors.
fix Prefer installing `hf-xet` via pre-built wheels (`pip install hf-xet`). If building from source is necessary and issues arise, check `huggingface/xet-core` GitHub issues for workarounds or try upgrading `maturin` and Python toolchains. For Termux, local cloning and installation might be required.
gotcha While `hf-xet` aims to improve performance, some users have reported occasional 503 errors, silent hangs at 90-99%, or issues resuming downloads with `xet` compared to conventional `hub` downloads, particularly in early versions or specific network conditions.
fix Adaptive concurrency has been enabled to address some stability issues. For persistent download issues, environment variables like `HF_XET_HIGH_PERFORMANCE=1`, `HF_XET_CHUNK_CACHE_SIZE_BYTES=0`, and specific cache path settings might help, or consider using `huggingface_hub.snapshot_download` with a valid `HF_TOKEN`.
breaking The `huggingface_hub` library (which depends on `hf-xet`) updated its minimum Python version to 3.9 from 3.8 starting with `huggingface_hub` v1.0. While `hf-xet` itself lists `>=3.8`, using it with a modern `huggingface_hub` version implies the higher Python requirement.
fix Ensure your Python environment is running Python 3.9 or newer when working with recent versions of `huggingface_hub` and its dependencies, including `hf-xet`.
breaking The `huggingface_hub` library is a fundamental dependency for interacting with the Hugging Face Hub. A `ModuleNotFoundError` indicates that the package is not installed or not accessible in the current Python environment.
fix Ensure `huggingface_hub` is installed in your Python environment by running `pip install huggingface_hub` before attempting to import or use it.
breaking The 'huggingface_hub' package was not found in the Python environment, resulting in a 'ModuleNotFoundError'. This typically occurs when the package has not been installed or is not accessible in the current Python interpreter.
fix Ensure 'huggingface_hub' is properly installed in your Python environment by running `pip install huggingface_hub` before attempting to import or use it. Verify that the correct Python environment is active when running your script.
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -

This quickstart demonstrates how `hf-xet` is utilized implicitly by the `huggingface_hub` library for efficient large file transfers. While `hf-xet` is not directly imported, operations like `snapshot_download` or `upload_file` from `huggingface_hub` will leverage `hf-xet`'s chunk-based deduplication and optimized transfer protocols for Xet-enabled repositories.

import os
from huggingface_hub import snapshot_download

# Ensure you have a Hugging Face token set as an environment variable (HF_TOKEN)
# For example: os.environ['HF_TOKEN'] = 'hf_YOUR_TOKEN_HERE'
# You can generate a token at: https://huggingface.co/settings/tokens

# hf-xet is used implicitly by huggingface_hub for large file transfers.
# This example downloads a small model, but the benefits of Xet become apparent
# with very large models or datasets that leverage its chunk-based deduplication.

try:
    # Downloading a model using huggingface_hub, which leverages hf-xet internally
    # for Xet-enabled repositories.
    model_path = snapshot_download(
        repo_id="google/fnet-tokenizer", # A small, example repo
        allow_patterns=["tokenizer.json"],
        local_dir="./fnet-tokenizer-local",
        token=os.environ.get('HF_TOKEN', None) # Pass token if set as env var
    )
    print(f"Model downloaded to: {model_path}")
    print("hf-xet was leveraged by huggingface_hub for efficient transfer.")
except Exception as e:
    print(f"An error occurred during download: {e}")
    print("Please ensure you have an active internet connection and a valid Hugging Face token if the repo is private or requires auth.")