Pipeline Status Reporter

0.13.1 · active · verified Fri Apr 17

pipestat is a Python library that acts as a pipeline results reporter. It provides a flexible way to manage and track the status and outputs of computational pipelines, supporting various backends like YAML files, SQLite databases, and Pephub. The current version is 0.13.1, and it maintains a regular release cadence, with several minor versions and patches released annually.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `PipestatManager` with a configuration and schema, report a result for a record, and then retrieve it. It uses temporary files for the configuration and schema to keep it self-contained and runnable.

import os
import tempfile

from pipestat import PipestatManager

# Create dummy config and schema files in a temporary directory
tmpdir = tempfile.TemporaryDirectory()
config_file_path = os.path.join(tmpdir.name, "pipestat_config.yaml")
schema_file_path = os.path.join(tmpdir.name, "results_schema.yaml")
db_file_path = os.path.join(tmpdir.name, "pipestat_test.sqlite")

config_content = f"""
database:
  db_file: {db_file_path}
pipeline_name: my_pipeline
schema_path: {schema_file_path}
"""
with open(config_file_path, "w") as f:
    f.write(config_content)

schema_content = """
properties:
  sample_name:
    type: string
  my_result:
    type: string
  my_numeric_result:
    type: number
required:
  - sample_name
  - my_result
"""
with open(schema_file_path, "w") as f:
    f.write(schema_content)

# Initialize PipestatManager
psm = PipestatManager(
    config_file=config_file_path,
    schema_path=schema_file_path
)

# Report a result
record_identifier = "sample1"
result_name = "my_result"
result_value = "SUCCESS"
psm.report(record_identifier=record_identifier, result_name=result_name, value=result_value)

# Report another result
psm.report(record_identifier=record_identifier, result_name="my_numeric_result", value=123.45)

print(f"Reported '{result_name}' for '{record_identifier}' as '{result_value}'")

# Retrieve results
retrieved_result = psm.retrieve(record_identifier=record_identifier, result_name=result_name)
print(f"Retrieved '{result_name}' for '{record_identifier}': {retrieved_result}")

# Clean up
tmpdir.cleanup()

view raw JSON →