Trafilatura
JSON →Trafilatura is a Python and command-line tool designed for gathering text and metadata from the web. It specializes in crawling, scraping, and extracting main content from web pages, supporting various output formats like CSV, JSON, HTML, Markdown, TXT, and XML. The library is actively maintained with frequent releases, offering robust extraction, navigation, and deduplication features.
Traffic · last 30 days ↑167% vs prev 7d
total hits 15
actors 5 distinct systems
last hit 2d ago GPTBot
top countries 🇺🇸 United States · 🇮🇳 India · 🇩🇪 Germany · 🇨🇦 Canada
Resources
homepagetrafilatura.readthedocs.io ↗
API endpoints
full doc /v1/registry/trafilatura
install /v1/registry/trafilatura/install
compatibility /v1/registry/trafilatura/compatibility