{"id":5825,"library":"warc3-wet","title":"warc3-wet","description":"warc3-wet is a Python library designed to work with ARC and WARC (Web ARChive) files, which are formats for storing web crawls. It is a fork of the original `warc` repository, updated for Python 3 compatibility and to handle issues with specific datasets like ClueWeb09. The current version is 0.2.5, released on July 17, 2024, indicating an active, though not rapid, release cadence for maintenance and compatibility updates.","status":"active","version":"0.2.5","language":"en","source_language":"en","source_url":"https://github.com/Willian-Zhang/warc3","tags":["web archiving","WARC","WET","ARC","data processing"],"install":[{"cmd":"pip install warc3-wet","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"Despite the package name `warc3-wet`, the primary module to import for functionality is `warc`, mirroring the original library's interface. Importing directly from `warc3_wet` is not the intended public API.","wrong":"from warc3_wet import warc","symbol":"warc","correct":"import warc"}],"quickstart":{"code":"import warc\n\n# Assuming 'test.warc.wet' is a valid WARC/WET file\n# For demonstration, we'll create a dummy file if it doesn't exist\n# In a real scenario, you would have an actual WARC/WET file.\nimport os\nif not os.path.exists(\"test.warc.wet\"):\n    with open(\"test.warc.wet\", \"w\") as f:\n        f.write(\"WARC/1.0\\r\\n\")\n        f.write(\"WARC-Type: warcinfo\\r\\n\")\n        f.write(\"WARC-Date: 2023-01-01T12:00:00Z\\r\\n\")\n        f.write(\"WARC-Record-ID: <urn:uuid:00000000-0000-0000-0000-000000000001>\\r\\n\")\n        f.write(\"Content-Length: 0\\r\\n\")\n        f.write(\"\\r\\n\")\n        f.write(\"\\r\\n\")\n        f.write(\"WARC/1.0\\r\\n\")\n        f.write(\"WARC-Type: response\\r\\n\")\n        f.write(\"WARC-Target-URI: http://example.com/\\r\\n\")\n        f.write(\"WARC-Date: 2023-01-01T12:00:01Z\\r\\n\")\n        f.write(\"WARC-Record-ID: <urn:uuid:00000000-0000-0000-0000-000000000002>\\r\\n\")\n        f.write(\"Content-Length: 33\\r\\n\")\n        f.write(\"Content-Type: text/plain\\r\\n\")\n        f.write(\"\\r\\n\")\n        f.write(\"HTTP/1.1 200 OK\\r\\n\")\n        f.write(\"Content-Length: 9\\r\\n\")\n        f.write(\"\\r\\n\")\n        f.write(\"Hello World\\r\\n\")\n\nwith warc.open(\"test.warc.wet\") as f:\n    for record in f:\n        if 'WARC-Target-URI' in record and 'Content-Length' in record:\n            print(f\"URI: {record['WARC-Target-URI']}, Length: {record['Content-Length']}\")\n\n# Clean up the dummy file\nos.remove(\"test.warc.wet\")","lang":"python","description":"This quickstart demonstrates how to open and iterate through records in a WARC or WET file. It includes a minimal setup to create a dummy WARC/WET file for immediate execution and then processes it, printing the target URI and content length for each record."},"warnings":[{"fix":"Always use `import warc` to access the library's functionality, consistent with the original `warc` library interface.","message":"Despite the package name `warc3-wet`, the module to import is `warc`. Users accustomed to the PyPI package name might incorrectly attempt `import warc3_wet` or `from warc3_wet import warc`, which is not the correct public API usage.","severity":"gotcha","affected_versions":"0.2.x and earlier"},{"fix":"Ensure your environment is Python 3. Review and update any code expecting Python 2-specific behaviors or dependencies when migrating from the original `warc` library.","message":"This library is a Python 3 port and fork of an older, 'now dead' Python 2 `warc` library. While the interface is largely unchanged, direct compatibility with Python 2 applications using the original `warc` library is not guaranteed, and migration efforts will be required for Python 2 codebases.","severity":"breaking","affected_versions":"All versions (compared to original Python 2 warc library)"},{"fix":"Always follow the installation instructions provided on the `warc3-wet` PyPI page or GitHub README (`pip install warc3-wet`). Consult the `warc.readthedocs.org` for API usage, but verify installation and package-specific details against `warc3-wet`'s own distribution.","message":"The official documentation for `warc3-wet` points to `http://warc.readthedocs.org/`, which is the documentation for the *original* `warc` library. While the interface is stated to be largely unchanged, be aware that any installation instructions on that external documentation may not apply to `warc3-wet`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}