Luigi Workflow Management

3.8.0 · active · verified Sat Apr 11

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, and much more. It's developed by Spotify and is currently in version 3.8.0, with minor releases typically occurring every few months.

Warnings

Install

Imports

Quickstart

This quickstart defines a simple Luigi task `GenerateReport` that takes a `date` parameter, simulates generating a report, and writes it to a local file. It demonstrates defining a task, its `run` method, and its `output` target using `luigi.LocalTarget`. The example also shows how to trigger tasks programmatically using `luigi.build` with a local scheduler.

import luigi
import datetime
import os

class GenerateReport(luigi.Task):
    date = luigi.DateParameter(default=datetime.date.today())

    def run(self):
        # Simulate some data processing
        report_content = f"Daily Report for {self.date.isoformat()}\n" \
                         f"Generated by Luigi.\n"

        # Write the report to a target file
        with self.output().open('w') as f:
            f.write(report_content)

    def output(self):
        # Define where the output of this task will be stored
        # Using an environment variable for flexibility or defaulting to current dir
        output_dir = os.environ.get('LUIGI_OUTPUT_DIR', '.')
        return luigi.LocalTarget(os.path.join(output_dir, f'report_{self.date.isoformat()}.txt'))

if __name__ == '__main__':
    # To run this task using the command line (most common):
    # 1. Start the Luigi scheduler daemon in a separate terminal: luigid --port 8082
    # 2. Run your script: python your_script_name.py GenerateReport --date 2023-10-26
    #    (or omit --date for today's date if default is set)
    # 3. If you don't want to run luigid, use the --local-scheduler flag:
    #    python your_script_name.py GenerateReport --local-scheduler

    # For programmatic execution (e.g., in a wrapper script or test):
    # The 'local_scheduler=True' ensures it runs without an external luigid daemon.
    luigi.build([GenerateReport(date=datetime.date(2023, 10, 26))], local_scheduler=True)

view raw JSON →