Hugging Face Xet

1.4.2 · active · verified Sat Mar 28

hf-xet is a foundational Python library that provides client technology for Xet storage, deeply integrated within the Hugging Face Hub ecosystem. It enables efficient, chunk-based deduplicated transfer of large files, such as machine learning models and datasets, to and from the Hugging Face Hub. This Rust-based library serves primarily as a backend for `huggingface_hub` and is not typically intended for direct user interaction. The project maintains an active development status, with releases often coinciding with or supporting major updates to the `huggingface_hub` library.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how `hf-xet` is utilized implicitly by the `huggingface_hub` library for efficient large file transfers. While `hf-xet` is not directly imported, operations like `snapshot_download` or `upload_file` from `huggingface_hub` will leverage `hf-xet`'s chunk-based deduplication and optimized transfer protocols for Xet-enabled repositories.

import os
from huggingface_hub import snapshot_download

# Ensure you have a Hugging Face token set as an environment variable (HF_TOKEN)
# For example: os.environ['HF_TOKEN'] = 'hf_YOUR_TOKEN_HERE'
# You can generate a token at: https://huggingface.co/settings/tokens

# hf-xet is used implicitly by huggingface_hub for large file transfers.
# This example downloads a small model, but the benefits of Xet become apparent
# with very large models or datasets that leverage its chunk-based deduplication.

try:
    # Downloading a model using huggingface_hub, which leverages hf-xet internally
    # for Xet-enabled repositories.
    model_path = snapshot_download(
        repo_id="google/fnet-tokenizer", # A small, example repo
        allow_patterns=["tokenizer.json"],
        local_dir="./fnet-tokenizer-local",
        token=os.environ.get('HF_TOKEN', None) # Pass token if set as env var
    )
    print(f"Model downloaded to: {model_path}")
    print("hf-xet was leveraged by huggingface_hub for efficient transfer.")
except Exception as e:
    print(f"An error occurred during download: {e}")
    print("Please ensure you have an active internet connection and a valid Hugging Face token if the repo is private or requires auth.")

view raw JSON →