warc3-wet-clueweb09

library 0.2.5 ·python

✓ verified May 25, 2026

A Python library designed to efficiently parse and work with ARC and WARC files, specifically tailored with fixes and optimizations for ClueWeb09 WET (Web Extracted Text) files. It provides an interface to iterate over records within these compressed archives. The current version is 0.2.5, indicating a pre-1.0 status with potential for future API changes, and it's maintained on an as-needed basis.