pyRdfa3: RDFa Distiller/Parser
pyRdfa3 is a Python library that functions as an RDFa 1.1 distiller and parser. It can extract RDFa 1.1 (and, if properly configured, RDFa 1.0) from various document types including (X)HTML, SVG, and general XML. The library outputs either serialized RDF graphs or an RDFLib Graph object. The current version is 3.6.5. While the original maintainer has archived the primary GitHub repository, the 3.6.5 version is now built and maintained under a new GitHub Pages project, suggesting a community-driven or less frequent release cadence going forward.
Common errors
-
TypeError: 'module' object is not callable
cause Attempting to call the `pyRdfa` module directly instead of instantiating the `pyRdfa` class within the module.fixEnsure you are calling the `pyRdfa` class. Correct: `from pyRdfa import pyRdfa; distiller = pyRdfa()`. Incorrect: `import pyRdfa; distiller = pyRdfa()`. -
ModuleNotFoundError: No module named 'pyRdfa'
cause The `pyrdfa3` package is not installed or the Python environment is not correctly configured.fixInstall the package using pip: `pip install pyRdfa3`. If it's already installed, verify your Python environment's PYTHONPATH. -
rdflib.exceptions.SerializerNotAvailable: No serializer for format 'json-ld' installed.
cause The JSON-LD serializer is not available in the installed RDFLib version or `pyRdfaExtras` (or `rdflib-jsonld`) is not installed or detected.fixEnsure you have a compatible `rdflib` version (>=7.0.0). If the issue persists, try installing `pyRdfaExtras` if it's explicitly required by your `pyrdfa3` version for JSON-LD, or `pip install rdflib-jsonld` if using an older RDFLib that doesn't include it.
Warnings
- breaking The original maintainer has retired and archived the primary GitHub repository. While version 3.6.5 is now maintained by a new entity, this indicates a significant shift in project leadership and potentially irregular future updates or support.
- breaking `pyrdfa3` no longer supports Python 2.x. It explicitly requires Python 3.8 or higher.
- gotcha Some parsing behaviors, particularly around RDFa 1.0 vs. 1.1 specifics (e.g., `@property`, list handling, `@typeof`), have changed due to the library being a rewrite of a previous RDFa 1.0 distiller.
- gotcha The default RDF serialization formats rely on RDFLib's serializers. Older RDFLib releases might have issues with certain serialization formats, and formats like JSON-LD might require the `pyRdfaExtras` package or a compatible `rdflib_jsonld` package if not part of the core RDFLib distribution.
- gotcha The `CGI_RDFa.py` and `localRdfa.py` utility scripts included in the distribution have not been ported to Python 3.x and will not function correctly on modern Python environments.
Install
-
pip install pyRdfa3
Imports
- pyRdfa
from pyRdfa import pyRdfa
Quickstart
from pyRdfa import pyRdfa
from rdflib import Graph
# Example HTML content with RDFa
html_content = '''
<div prefix="schema: http://schema.org/">
<p typeof="schema:Person">
<span property="schema:name">Jane Doe</span>
<span property="schema:jobTitle">Professor</span>
<a href="http://www.example.com/janedoe" property="schema:url">Homepage</a>
</p>
</div>
'''
# Create a dummy file for demonstration
with open("example.html", "w") as f:
f.write(html_content)
# Extract RDF as a serialized string (Turtle format by default)
turtle_output = pyRdfa().rdf_from_source('example.html')
print("--- Turtle Output ---")
print(turtle_output)
# Extract RDF as an RDFLib Graph object
graph = pyRdfa().graph_from_source('example.html')
print("\n--- RDFLib Graph (Triples) ---")
for s, p, o in graph:
print(s, p, o)
# Clean up the dummy file (optional)
import os
os.remove("example.html")