Pypiper
Pypiper is a lightweight Python toolkit designed for building robust, restartable command-line pipelines. It simplifies the process of creating complex data processing workflows by handling logging, error recovery, and status tracking. The current version is 0.15.1, and it maintains an active release cadence with regular updates.
Common errors
-
ModuleNotFoundError: No module named 'pypiper'
cause The pypiper library is not installed in the current Python environment.fixRun `pip install pypiper` to install the library. -
TypeError: PipelineManager.__init__() got an unexpected keyword argument 'pipestat_sample_name'
cause You are using an older parameter name for `pipestat` integration with a newer version of Pypiper.fixChange `pipestat_sample_name` to `pipestat_record_identifier` in your `PipelineManager` constructor. Also check for `pipestat_project_name` which was removed. -
SyntaxError: invalid syntax (from trying to run pypiper code on Python 2.7)
cause Your Python environment is too old for the current version of Pypiper, which requires Python >=3.10.fixUpgrade your Python interpreter to version 3.10 or newer. For example, use `python3.10 your_script.py` or create a new virtual environment with a newer Python version. -
ERROR: Pipeline stage 'my_stage' failed!
cause A command executed by `pm.run()` returned a non-zero exit code, indicating failure. Pypiper caught this and marked the stage as failed.fixExamine the pipeline log file (usually `[outdir]/[pipeline_name].log`) for the specific error messages from your shell command. Debug the command as you would outside Pypiper.
Warnings
- breaking Pypiper v0.14.0 dropped support for Python 2.7. Users on older Python versions will encounter `SyntaxError` or `ModuleNotFoundError`.
- breaking Significant changes to `pipestat` integration parameters occurred in v0.14.0 and v0.14.1. `pipestat_project_name` parameter was removed, `pipestat_sample_name` was renamed to `pipestat_record_identifier`, and `message_raw` type changed.
- gotcha The default value for `force_overwrite` in `PipelineManager` changed from `False` to `True` in v0.14.1. This means existing pipelines might unexpectedly rerun stages if not explicitly configured.
- gotcha Pypiper relies on `target` files for restartability. If a stage's `target` file is not correctly created or updated by the command, Pypiper may incorrectly assume the stage failed or needs to be rerun, or conversely, skip a stage that should run.
Install
-
pip install pypiper
Imports
- PipelineManager
from pypiper import PipelineManager
- ngs_pipe
from pypiper import ngs_pipe
Quickstart
import pypiper
import os
# Define pipeline name and output directory
PIPELINE_NAME = "my_pypiper_example"
OUTDIR = "pypiper_output"
os.makedirs(OUTDIR, exist_ok=True)
# Initialize PipelineManager
pm = pypiper.PipelineManager(name=PIPELINE_NAME, outdir=OUTDIR)
print(f"\n--- Starting Pypiper Pipeline: {PIPELINE_NAME} ---")
# Stage 1: Create an initial file
input_file = os.path.join(OUTDIR, "raw_data.txt")
cmd1 = f"echo 'Line 1\nLine 2\nLine 3' > {input_file}"
pm.run(cmd1, target=input_file, stage_name="create_raw_data")
# Stage 2: Process the file (e.g., count lines)
output_file = os.path.join(OUTDIR, "processed_data.txt")
cmd2 = f"wc -l {input_file} > {output_file}"
pm.run(cmd2, target=output_file, stage_name="count_lines")
# Report a result to pipestat (requires pipestat to be configured or just report to log)
pm.report_result("lines_counted", os.path.getsize(output_file))
# Close the pipeline manager (flushes logs, finishes reporting)
pm.close()
print(f"--- Pipeline Finished! Check '{OUTDIR}' for results. ---")
print(f"Content of {output_file}:")
with open(output_file, 'r') as f:
print(f.read().strip())
# Clean up (optional for quickstart demonstration)
# import shutil
# shutil.rmtree(OUTDIR)