Hugging Face Xet
hf-xet is a foundational Python library that provides client technology for Xet storage, deeply integrated within the Hugging Face Hub ecosystem. It enables efficient, chunk-based deduplicated transfer of large files, such as machine learning models and datasets, to and from the Hugging Face Hub. This Rust-based library serves primarily as a backend for `huggingface_hub` and is not typically intended for direct user interaction. The project maintains an active development status, with releases often coinciding with or supporting major updates to the `huggingface_hub` library.
Warnings
- gotcha hf-xet is an underlying dependency for `huggingface_hub` and is not designed for direct user interaction or import in typical Python code. All functionalities, including optimized large file transfers, are exposed through the `huggingface_hub` API.
- breaking Installing `hf-xet` via `pip install --no-binary hf-xet` or in non-standard environments (like Termux) could fail due to `maturin` (Rust-Python binding) build issues, particularly with older versions (e.g., 1.0.3). This might manifest as 'Failed to normalize python source path `python`' errors.
- gotcha While `hf-xet` aims to improve performance, some users have reported occasional 503 errors, silent hangs at 90-99%, or issues resuming downloads with `xet` compared to conventional `hub` downloads, particularly in early versions or specific network conditions.
- breaking The `huggingface_hub` library (which depends on `hf-xet`) updated its minimum Python version to 3.9 from 3.8 starting with `huggingface_hub` v1.0. While `hf-xet` itself lists `>=3.8`, using it with a modern `huggingface_hub` version implies the higher Python requirement.
Install
-
pip install hf-xet
Imports
- No direct imports for end-users
hf-xet is primarily an internal dependency of huggingface_hub. Users interact through huggingface_hub's API.
Quickstart
import os
from huggingface_hub import snapshot_download
# Ensure you have a Hugging Face token set as an environment variable (HF_TOKEN)
# For example: os.environ['HF_TOKEN'] = 'hf_YOUR_TOKEN_HERE'
# You can generate a token at: https://huggingface.co/settings/tokens
# hf-xet is used implicitly by huggingface_hub for large file transfers.
# This example downloads a small model, but the benefits of Xet become apparent
# with very large models or datasets that leverage its chunk-based deduplication.
try:
# Downloading a model using huggingface_hub, which leverages hf-xet internally
# for Xet-enabled repositories.
model_path = snapshot_download(
repo_id="google/fnet-tokenizer", # A small, example repo
allow_patterns=["tokenizer.json"],
local_dir="./fnet-tokenizer-local",
token=os.environ.get('HF_TOKEN', None) # Pass token if set as env var
)
print(f"Model downloaded to: {model_path}")
print("hf-xet was leveraged by huggingface_hub for efficient transfer.")
except Exception as e:
print(f"An error occurred during download: {e}")
print("Please ensure you have an active internet connection and a valid Hugging Face token if the repo is private or requires auth.")