news-please: News Crawler and Extractor

JSON →
library 1.6.16 ·python
verified May 25, 2026

news-please is an open-source, easy-to-use Python library designed for crawling news websites and extracting structured information from articles. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles. The library also provides an API for programmatic use within Python applications and supports extracting articles from the commoncrawl.org news archive. It is currently active, with version 1.6.16 released, and maintains a regular release cadence.

total hits 17
actors 7 distinct systems
last hit 3d ago MetaBot
MetaBot
4
Script
3
GPTBot
2
ClaudeBot
1
Search engines
1

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇫🇷 France · 🇨🇦 Canada · 🇫🇮 Finland