PySpark Hugging Face Data Source

JSON β†’
library 2.1.0 Β·python
βœ“ verified May 24, 2026

pyspark-huggingface is a Spark Data Source for seamlessly accessing πŸ€— Hugging Face Datasets as Spark DataFrames. It enables streaming datasets from the Hub, applying projection and predicate filters, and saving Spark DataFrames back to Hugging Face as Parquet files with fast, deduplicated uploads. It supports authentication via `huggingface-cli login` or tokens, and is compatible with Spark 4 (with auto-import) as well as backporting functionality for Spark 3.5, 3.4, and 3.3. The current version is 2.1.0 and it is actively maintained.

total hits 22
actors 7 distinct systems
last hit 2d ago ByteDance
GPTBot
6
MetaBot
4
Script
2
ByteDance
2
ClaudeBot
1

top countries πŸ‡ΊπŸ‡Έ United States Β· πŸ‡«πŸ‡· France Β· πŸ‡ΈπŸ‡¬ Singapore Β· πŸ‡¨πŸ‡¦ Canada Β· πŸ‡©πŸ‡ͺ Germany