sec-parser

raw JSON →
0.58.1 verified Fri May 01 auth: no python

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document. Version 0.58.1, active development.

pip install sec-parser
error ModuleNotFoundError: No module named 'sec_parser'
cause Misspelled import or missing package.
fix
Install via pip install sec-parser and import as from sec_parser import SECParser
error AttributeError: module 'sec_parser' has no attribute 'SECParser'
cause Importing directly from module rather than from sec_parser import SECParser.
fix
Use correct import: from sec_parser import SECParser
error TypeError: parse() got an unexpected keyword argument 'html_source'
cause Parameter name changed in v0.40 from html_source to html.
fix
Use parser.parse(html=...) instead of parser.parse(html_source=...)
breaking In v0.50+, the output tree structure changed: flat list replaced nested tree. Code relying on parent-child traversal will break.
fix Use iteration over flat list instead of recursive traversal.
deprecated SECParser.parse_html() removed in v0.40. Use SECParser.parse() instead.
fix Replace parse_html() with parse()
gotcha Input HTML must be valid EDGAR document. Non-standard HTML may cause empty tree.
fix Ensure HTML is from SEC EDGAR, not scraped with missing elements.

Parse SEC EDGAR HTML into a tree of semantic elements, then iterate over them.

from sec_parser import SECParser
from sec_parser.semantic_elements import TextElement

html = "<html><body><p>Revenue: $100</p></body></html>"
parser = SECParser()
tree = parser.parse(html)
# Print top-level elements
for element in tree:
    if isinstance(element, TextElement):
        print(element.text)