mf2py: Microformats Parser
mf2py is a Python library for parsing Microformats data from HTML documents. It provides full support for microformats2, offers backwards-compatible support for microformats1, and includes experimental support for metaformats. The library is actively maintained, with version 2.0.1 being the latest release, and is part of the broader IndieWeb ecosystem.
Common errors
-
TypeError: Parser.__init__() got an unexpected keyword argument 'img_with_alt'
cause Attempting to use the `img_with_alt` argument which was removed in mf2py 2.0.fixRemove the `img_with_alt` parameter from `mf2py.parse()` or `mf2py.Parser()` calls. Image alt parsing is now enabled by default. -
python: command not found
cause The `python` command is not in your system's PATH, or you intend to use `python3` specifically.fixUse `python3 -m pip install mf2py` or ensure `python` correctly points to your desired Python 3 interpreter. -
ModuleNotFoundError: No module named 'mf2py'
cause The `mf2py` library is not installed in the current Python environment.fixRun `pip install mf2py` to install the library.
Warnings
- breaking mf2py 2.0 officially dropped support for Python versions lower than 3.8. Attempting to install or run on older Python versions will result in errors.
- breaking The `img_with_alt` keyword argument for `parse()` and `Parser()` was removed in mf2py 2.0. Image `alt` text support is now enabled by default. Using this argument will raise an error.
- breaking The `dict_class` option for `Parser()` was removed and replaced with the standard `dict` in mf2py 2.0. Custom dictionary classes are no longer supported.
- gotcha When parsing HTML documents, especially from untrusted sources, be aware that passing a BeautifulSoup document to `mf2py` might modify the original BeautifulSoup object.
Install
-
pip install mf2py
Imports
- parse
from mf2py import parse
- Parser
from mf2py import Parser
Quickstart
import mf2py
html_doc = """
<div class="h-entry">
<h1 class="p-name">My Awesome Post</h1>
<time class="dt-published" datetime="2023-11-30T19:08:09">November 30, 2023</time>
<a class="p-author h-card" href="https://example.com/james">James</a>
<img class="u-photo" src="https://example.com/post-image.jpg" alt="Post illustration">
</div>
"""
mf2_data = mf2py.parse(doc=html_doc)
print(mf2_data)
# Example of parsing a URL (requires internet access)
# from mf2py import parse
# url_data = mf2py.parse(url="https://events.indieweb.org/")
# print(url_data["items"][0]["type"])