Breadability
raw JSON → 0.1.20 verified Mon Apr 27 auth: no python maintenance
A Python port of the Readability HTML parser (originally by Arc90) for extracting readable content from HTML pages. Current version: 0.1.20. The library is in maintenance mode with infrequent releases.
pip install breadability Common errors
error ModuleNotFoundError: No module named 'breadability' ↓
cause Library not installed or installed in wrong environment.
fix
Run: pip install breadability
error AttributeError: module 'breadability' has no attribute 'readable' ↓
cause Trying to import from wrong path; breadability is a package, not a module.
fix
Use: from breadability.readable import Article
error TypeError: __init__() takes 1 positional argument but 2 were given ↓
cause Passing URL instead of HTML string to Article constructor.
fix
Fetch HTML first: article = Article(requests.get(url).text)
Warnings
deprecated The library is no longer actively maintained; consider using readability-lxml or trafilatura instead. ↓
fix Switch to readability-lxml (pip install readability-lxml) or trafilatura for better maintained alternatives.
gotcha Import paths differ from many tutorials. The main class 'Article' is in the 'breadability.readable' submodule, not directly in 'breadability'. ↓
fix Use 'from breadability.readable import Article'.
gotcha The library expects HTML string input, not a URL. You must fetch the page yourself (e.g., with requests). ↓
fix Use requests.get(url).text to get the HTML, then pass to Article().
Imports
- readable wrong
from breadability import Articlecorrectfrom breadability.readable import Article - readable wrong
from breadability import readability_filtercorrectfrom breadability.readable import readability_filter
Quickstart
import requests
from breadability.readable import Article
url = 'https://example.com/article'
response = requests.get(url)
article = Article(response.text)
print(article.title)
print(article.content)