inscriptis

2.7.1 · active · verified Sat Apr 11

Inscriptis is a Python-based HTML to text conversion library, command line client, and Web service (v2.7.1). It specializes in providing high-quality, layout-aware text representations of HTML content, including support for nested tables and a subset of CSS, and offers optional annotated output. The library is actively maintained with regular releases addressing new Python versions and feature enhancements.

Warnings

Install

Imports

Quickstart

Convert HTML from a URL to plain text, preserving layout and structure. The example fetches content from 'https://www.informationscience.ch' and prints its text representation.

import urllib.request
from inscriptis import get_text

url = "https://www.informationscience.ch"
try:
    with urllib.request.urlopen(url) as response:
        html_content = response.read().decode('utf-8')
except Exception as e:
    html_content = f"<html><body><p>Error fetching URL: {e}</p></body></html>"

text = get_text(html_content)
print(text)

view raw JSON →