html-text

library 0.7.1 ·python

✓ verified May 21, 2026

html-text is a Python library designed to extract clean, readable plain text from HTML content. It goes beyond simple text extraction by removing invisible non-text content like inline styles, JavaScript, and comments. The library intelligently normalizes whitespace and can optionally add newlines after block-level elements (e.g., headers, paragraphs) to produce text that more closely resembles browser rendering, making it suitable for text classification or further natural language processing. The current version is 0.7.1, and it maintains an active development status.

Traffic · last 30 days ↓75% vs prev 7d · indexed Sun Apr 12 · updated Wed May 27

total hits 9

actors 5 distinct systems

last hit 5d ago Script

GPTBot

Script

ChatGPT-User

top countries 🇺🇸 United States · 🇫🇮 Finland · 🇨🇦 Canada · 🇩🇪 Germany

Resources

githubgithub.com/zytedata/html-text ↗

packagepypi.org/project/html-text/ ↗

API endpoints

full doc /v1/registry/html-text

install /v1/registry/html-text/install

compatibility /v1/registry/html-text/compatibility