Breadability

raw JSON →
0.1.20 verified Mon Apr 27 auth: no python maintenance

A Python port of the Readability HTML parser (originally by Arc90) for extracting readable content from HTML pages. Current version: 0.1.20. The library is in maintenance mode with infrequent releases.

pip install breadability
error ModuleNotFoundError: No module named 'breadability'
cause Library not installed or installed in wrong environment.
fix
Run: pip install breadability
error AttributeError: module 'breadability' has no attribute 'readable'
cause Trying to import from wrong path; breadability is a package, not a module.
fix
Use: from breadability.readable import Article
error TypeError: __init__() takes 1 positional argument but 2 were given
cause Passing URL instead of HTML string to Article constructor.
fix
Fetch HTML first: article = Article(requests.get(url).text)
deprecated The library is no longer actively maintained; consider using readability-lxml or trafilatura instead.
fix Switch to readability-lxml (pip install readability-lxml) or trafilatura for better maintained alternatives.
gotcha Import paths differ from many tutorials. The main class 'Article' is in the 'breadability.readable' submodule, not directly in 'breadability'.
fix Use 'from breadability.readable import Article'.
gotcha The library expects HTML string input, not a URL. You must fetch the page yourself (e.g., with requests).
fix Use requests.get(url).text to get the HTML, then pass to Article().

Fetch a URL and extract the readable article content.

import requests
from breadability.readable import Article

url = 'https://example.com/article'
response = requests.get(url)
article = Article(response.text)
print(article.title)
print(article.content)