Kedro

1.3.1 · active · verified Mon Apr 13

Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It applies software engineering best practices to data and analytics pipelines. The current version is 1.3.1, and releases are frequent, typically with patch and minor updates released monthly, and major versions less often.

Warnings

Install

Imports

Quickstart

This example demonstrates how to define Kedro nodes and combine them into a pipeline. It then uses a `SequentialRunner` and an in-memory `DataCatalog` to execute the pipeline. In a typical Kedro project, `kedro new` creates a project structure, and `kedro run` orchestrates execution via `KedroSession`, loading configurations from `conf/` files.

from kedro.io import DataCatalog, MemoryDataSet
from kedro.pipeline import Pipeline, node
from kedro.runner import SequentialRunner

# 1. Define node functions (plain Python functions)
def greet(name: str) -> str:
    """A node that greets a given name."""
    return f"Hello, {name}!"

def capitalize(text: str) -> str:
    """A node that capitalizes a string."""
    return text.upper()

# 2. Assemble nodes into a pipeline
def create_example_pipeline() -> Pipeline:
    return Pipeline([
        node(
            func=greet,
            inputs="input_name", # Input dataset key
            outputs="greeting_message", # Output dataset key
            name="greet_user_node"
        ),
        node(
            func=capitalize,
            inputs="greeting_message",
            outputs="final_output", # Final output dataset key
            name="capitalize_message_node"
        )
    ])

# 3. Create a DataCatalog with input data
# In a real Kedro project, this is usually defined in conf/base/catalog.yml
catalog = DataCatalog({
    "input_name": MemoryDataSet(data="World"),
    "final_output": MemoryDataSet() # Define an output dataset to store results
})

# 4. Instantiate the pipeline and a runner
my_pipeline = create_example_pipeline()
runner = SequentialRunner()

# 5. Run the pipeline
# In a real Kedro project, `kedro run` via `KedroSession` orchestrates this.
print("Running Kedro pipeline...")
result_catalog = runner.run(my_pipeline, catalog)

# 6. Retrieve results
final_message = result_catalog.load("final_output")
print(f"Pipeline finished. Final message: {final_message}")
# Expected output: Pipeline finished. Final message: HELLO, WORLD!

view raw JSON →