warc3-wet-clueweb09

JSON →
library 0.2.5 ·python
verified May 25, 2026

A Python library designed to efficiently parse and work with ARC and WARC files, specifically tailored with fixes and optimizations for ClueWeb09 WET (Web Extracted Text) files. It provides an interface to iterate over records within these compressed archives. The current version is 0.2.5, indicating a pre-1.0 status with potential for future API changes, and it's maintained on an as-needed basis.

total hits 11
actors 3 distinct systems
last hit 7d ago Script
GPTBot
6
Script
2

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · 🇫🇷 France