PROV Python Library
The `prov` library is a Python implementation of the W3C Provenance Data Model (PROV). It facilitates the creation, manipulation, and serialization/deserialization of provenance documents, supporting formats like PROV-JSON, PROV-XML, and PROV-O (RDF). The library is actively maintained, with its latest version being 2.1.1, and releases occur as needed to support newer Python versions and fix bugs.
Warnings
- breaking Python 2.x support was removed in version 2.0.0. The library is now Python 3 only.
- breaking Support for several End-of-Life Python versions has been dropped incrementally. Python 3.3 was removed in v1.5.3, Python 3.6 and 3.7 in v2.0.1, and Python 3.8 in v2.0.2. Current versions require Python >=3.9.
- breaking The `rdflib` dependency, used for PROV-O (RDF) serialization, was restricted to versions less than 7 (`rdflib <7`) starting from `prov` version 2.0.1 due to compatibility issues.
- deprecated The `pydotplus` dependency for graphical output was replaced by `pydot` in version 1.5.1.
- breaking The underlying data model for PROV documents underwent a rewrite in version 1.0.0, introducing incompatibilities with pre-1.0 versions. Methods like `add_record()` were renamed to `new_record()` in v1.0.1, and references to PROV elements became `QualifiedName` instances.
- gotcha Naming a local Python file or module `prov.py` will cause an `ImportError` due to a name collision with the `prov` library itself.
Install
-
pip install prov
Imports
- ProvDocument
from prov.model import ProvDocument
- Namespace
from prov.model import Namespace
- PROV_REC_NAMESPACE
from prov.constants import PROV_REC_NAMESPACE
Quickstart
import datetime
from prov.model import ProvDocument, Namespace, PROV_REC_NAMESPACE
# Create a new provenance document
doc = ProvDocument()
# Declare namespaces
doc.add_namespace('ex', 'http://example.org/')
doc.set_default_namespace('http://example.com/prov-example/')
# Declare entities, activities, agents
e1 = doc.entity('ex:entity1', {'prov:label': 'Example Entity 1'})
a1 = doc.activity('ex:activity1', datetime.datetime.now(), datetime.datetime.now(), {'prov:label': 'Example Activity 1'})
ag1 = doc.agent('ex:agent1', {'prov:label': 'Example Agent 1', PROV_REC_NAMESPACE['type']: PROV_REC_NAMESPACE['Person']})
# Establish relationships
doc.wasGeneratedBy(e1, a1, datetime.datetime.now())
doc.wasAssociatedWith(a1, ag1)
# Print the PROV-N representation
print(doc.get_provn())
# To serialize to PROV-JSON (requires the 'json' extra, installed with `pip install prov[json]`)
# from prov.serializers import provjson
# import json
# with open('example.json', 'w') as f:
# provjson.ProvJSONSerializer(doc).serialize(f)