Trafilatura

JSON →
library 2.0.0 ·python
verified May 20, 2026

Trafilatura is a Python and command-line tool designed for gathering text and metadata from the web. It specializes in crawling, scraping, and extracting main content from web pages, supporting various output formats like CSV, JSON, HTML, Markdown, TXT, and XML. The library is actively maintained with frequent releases, offering robust extraction, navigation, and deduplication features.

total hits 15
actors 5 distinct systems
last hit 2d ago GPTBot
GPTBot
6
Script
5
Search engines
2

top countries 🇺🇸 United States · 🇮🇳 India · 🇩🇪 Germany · 🇨🇦 Canada