sec-parser
raw JSON → 0.58.1 verified Fri May 01 auth: no python
Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual structure of the document. Version 0.58.1, active development.
pip install sec-parser Common errors
error ModuleNotFoundError: No module named 'sec_parser' ↓
cause Misspelled import or missing package.
fix
Install via pip install sec-parser and import as from sec_parser import SECParser
error AttributeError: module 'sec_parser' has no attribute 'SECParser' ↓
cause Importing directly from module rather than from sec_parser import SECParser.
fix
Use correct import: from sec_parser import SECParser
error TypeError: parse() got an unexpected keyword argument 'html_source' ↓
cause Parameter name changed in v0.40 from html_source to html.
fix
Use parser.parse(html=...) instead of parser.parse(html_source=...)
Warnings
breaking In v0.50+, the output tree structure changed: flat list replaced nested tree. Code relying on parent-child traversal will break. ↓
fix Use iteration over flat list instead of recursive traversal.
deprecated SECParser.parse_html() removed in v0.40. Use SECParser.parse() instead. ↓
fix Replace parse_html() with parse()
gotcha Input HTML must be valid EDGAR document. Non-standard HTML may cause empty tree. ↓
fix Ensure HTML is from SEC EDGAR, not scraped with missing elements.
Imports
- SECParser wrong
from secparser import SECParsercorrectfrom sec_parser import SECParser - SemanticElement wrong
from sec_parser import SemanticElementcorrectfrom sec_parser.semantic_elements import SemanticElement
Quickstart
from sec_parser import SECParser
from sec_parser.semantic_elements import TextElement
html = "<html><body><p>Revenue: $100</p></body></html>"
parser = SECParser()
tree = parser.parse(html)
# Print top-level elements
for element in tree:
if isinstance(element, TextElement):
print(element.text)