Trafilatura

library 2.0.0 ·python

✓ verified May 20, 2026

http-networking data

Trafilatura is a Python and command-line tool designed for gathering text and metadata from the web. It specializes in crawling, scraping, and extracting main content from web pages, supporting various output formats like CSV, JSON, HTML, Markdown, TXT, and XML. The library is actively maintained with frequent releases, offering robust extraction, navigation, and deduplication features.

Traffic · last 30 days ↑167% vs prev 7d · indexed Thu Apr 09 · updated Sun May 24

total hits 15

actors 5 distinct systems

last hit 2d ago GPTBot

GPTBot

6

Script

5

Search engines

2

top countries 🇺🇸 United States · 🇮🇳 India · 🇩🇪 Germany · 🇨🇦 Canada

Resources

githubgithub.com/adbar/trafilatura ↗

packagepypi.org/project/trafilatura/ ↗

homepagetrafilatura.readthedocs.io ↗

API endpoints

full doc /v1/registry/trafilatura

install /v1/registry/trafilatura/install

compatibility /v1/registry/trafilatura/compatibility