pyRdfa3: RDFa Distiller/Parser

3.6.5 · maintenance · verified Thu Apr 16

pyRdfa3 is a Python library that functions as an RDFa 1.1 distiller and parser. It can extract RDFa 1.1 (and, if properly configured, RDFa 1.0) from various document types including (X)HTML, SVG, and general XML. The library outputs either serialized RDF graphs or an RDFLib Graph object. The current version is 3.6.5. While the original maintainer has archived the primary GitHub repository, the 3.6.5 version is now built and maintained under a new GitHub Pages project, suggesting a community-driven or less frequent release cadence going forward.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `pyRdfa` to parse RDFa content from a source (a local HTML file in this case) and either obtain a serialized RDF string (defaulting to Turtle) or an RDFLib Graph object.

from pyRdfa import pyRdfa
from rdflib import Graph

# Example HTML content with RDFa
html_content = '''
<div prefix="schema: http://schema.org/">
  <p typeof="schema:Person">
    <span property="schema:name">Jane Doe</span>
    <span property="schema:jobTitle">Professor</span>
    <a href="http://www.example.com/janedoe" property="schema:url">Homepage</a>
  </p>
</div>
'''

# Create a dummy file for demonstration
with open("example.html", "w") as f:
    f.write(html_content)

# Extract RDF as a serialized string (Turtle format by default)
turtle_output = pyRdfa().rdf_from_source('example.html')
print("--- Turtle Output ---")
print(turtle_output)

# Extract RDF as an RDFLib Graph object
graph = pyRdfa().graph_from_source('example.html')
print("\n--- RDFLib Graph (Triples) ---")
for s, p, o in graph:
    print(s, p, o)

# Clean up the dummy file (optional)
import os
os.remove("example.html")

view raw JSON →