sgmllib3k

1.0.0 · abandoned · verified Thu Apr 09

sgmllib3k is a Python 3 port of the `sgmllib` module, which was deprecated in Python 2.6 and removed in Python 3.0. It provides a basic SGML/HTML parser for legacy applications. The current version is 1.0.0, released in 2011, and the project appears to be abandoned.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a basic parser by subclassing `SGMLParser` and overriding methods like `handle_data` to process the parsed content. The `feed()` method is used to pass the HTML string to the parser.

import sgmllib3k

class MyParser(sgmllib3k.SGMLParser):
    def __init__(self, verbose=0):
        sgmllib3k.SGMLParser.__init__(self, verbose)
        self.data = []

    def handle_data(self, data):
        self.data.append(data)

    def unknown_starttag(self, tag, attrs):
        # Example: print all start tags
        pass

    def unknown_endtag(self, tag):
        # Example: print all end tags
        pass

html_content = "<html><body><h1>Hello</h1><p>World</p></body></html>"
parser = MyParser()
parser.feed(html_content)
parser.close()

print("Extracted data:", parser.data)
# Expected output: Extracted data: ['Hello', 'World']

view raw JSON →