Pipeline Status Reporter
pipestat is a Python library that acts as a pipeline results reporter. It provides a flexible way to manage and track the status and outputs of computational pipelines, supporting various backends like YAML files, SQLite databases, and Pephub. The current version is 0.13.1, and it maintains a regular release cadence, with several minor versions and patches released annually.
Common errors
-
pydantic.v1.ValidationError: ...
cause Using an older version of Pydantic (v1.x) while pipestat requires Pydantic v2+.fixUpgrade Pydantic: `pip install --upgrade pydantic` or ensure `pydantic>=2` is installed. -
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/config.yaml'
cause The `config_file` or `schema_path` provided to `PipestatManager` does not point to an existing file.fixVerify that the paths to your `pipestat_config.yaml` and `results_schema.yaml` are correct and accessible by the script. -
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: record
cause The SQLite database backend was specified, but the database file either does not exist or has not been properly initialized with the required tables for pipestat.fixEnsure the `db_file` specified in your config is valid. `PipestatManager` should create the necessary tables upon initialization if the schema is provided and the database file is new. If it's an existing file, ensure it's not corrupted or missing tables.
Warnings
- breaking The schema structure for 'samples' results changed in v0.11.0. If your output schema previously defined 'samples' directly, it now needs to be an array type and nested under 'items'.
- gotcha pipestat requires Pydantic v2+. If you have Pydantic v1 installed in your environment, you may encounter `ValidationError` or import errors due to API changes between Pydantic major versions.
- gotcha Pipestat relies heavily on configuration files (e.g., `pipestat_config.yaml`) and result schemas (`results_schema.yaml`). Misconfiguration or incorrect paths can lead to runtime errors or unexpected behavior.
Install
-
pip install pipestat
Imports
- PipestatManager
from pipestat import PipestatManager
Quickstart
import os
import tempfile
from pipestat import PipestatManager
# Create dummy config and schema files in a temporary directory
tmpdir = tempfile.TemporaryDirectory()
config_file_path = os.path.join(tmpdir.name, "pipestat_config.yaml")
schema_file_path = os.path.join(tmpdir.name, "results_schema.yaml")
db_file_path = os.path.join(tmpdir.name, "pipestat_test.sqlite")
config_content = f"""
database:
db_file: {db_file_path}
pipeline_name: my_pipeline
schema_path: {schema_file_path}
"""
with open(config_file_path, "w") as f:
f.write(config_content)
schema_content = """
properties:
sample_name:
type: string
my_result:
type: string
my_numeric_result:
type: number
required:
- sample_name
- my_result
"""
with open(schema_file_path, "w") as f:
f.write(schema_content)
# Initialize PipestatManager
psm = PipestatManager(
config_file=config_file_path,
schema_path=schema_file_path
)
# Report a result
record_identifier = "sample1"
result_name = "my_result"
result_value = "SUCCESS"
psm.report(record_identifier=record_identifier, result_name=result_name, value=result_value)
# Report another result
psm.report(record_identifier=record_identifier, result_name="my_numeric_result", value=123.45)
print(f"Reported '{result_name}' for '{record_identifier}' as '{result_value}'")
# Retrieve results
retrieved_result = psm.retrieve(record_identifier=record_identifier, result_name=result_name)
print(f"Retrieved '{result_name}' for '{record_identifier}': {retrieved_result}")
# Clean up
tmpdir.cleanup()