Kubeflow Pipelines SDK
The Kubeflow Pipelines SDK (kfp), currently at version 2.16.0, is a Python library for building and deploying portable, scalable machine learning workflows based on Docker containers within the Kubeflow project. It allows users to compose multi-step workflows (pipelines) as a graph of containerized tasks using Python code and/or YAML. Releases are frequent, often bundling the SDK with related components like `kfp-pipeline-spec`, `kfp-server-api`, and `kfp-kubernetes`.
Warnings
- breaking KFP SDK v2 is generally not backward compatible with user code written using the KFP SDK v1 main namespace. Key breaking changes include a new more Pythonic SDK with decorators like `@dsl.pipeline` and `@dsl.component`, and compilation to a generic Intermediate Representation (IR) YAML instead of Argo Workflow YAML.
- breaking As of KFP 2.15.0, the default object store deployment for Kubeflow Pipelines has changed from MinIO to SeaweedFS. While MinIO is still supported, users upgrading from versions prior to 2.15.0 with existing or custom MinIO configurations for their backend may need to adjust their deployment manifests to maintain their desired object store configuration.
- breaking KFP 2.15.0 introduced a major upgrade to the underlying Gorm backend, necessitating an automated database index migration. This migration does not support rollback. It is strongly advised to back up production databases before initiating an upgrade from versions prior to 2.15.0.
- gotcha In KFP 2.15.0, a regression was identified for AWS S3 authentication using IAM Roles for Service Accounts (IRSA). Specifically, the environment variables `OBJECTSTORECONFIG_ACCESSKEY` and `OBJECTSTORECONFIG_SECRETACCESSKEY` (which could previously be empty or omitted when using IRSA) became implicitly required, leading to authentication failures.
Install
-
pip install kfp -
pip install kfp[kubernetes]
Imports
- Client
from kfp import Client
- dsl
from kfp import dsl
- component
from kfp.dsl import component
- pipeline
from kfp.dsl import pipeline
Quickstart
import kfp
from kfp import dsl
import os
# Define a lightweight Python component
@dsl.component
def add(a: float, b: float) -> float:
'''Calculates sum of two arguments'''
return a + b
# Define a pipeline using the component
@dsl.pipeline(
name='Addition pipeline',
description='An example pipeline that performs addition calculations.'
)
def add_pipeline(
a: float = 1.0,
b: float = 7.0,
):
first_add_task = add(a=a, b=4.0)
second_add_task = add(a=first_add_task.output, b=b)
# --- Running the pipeline (requires KFP backend) ---
# In a real environment, you'd configure the KFP client to connect to your KFP instance.
# For local testing without a KFP backend, you can use `kfp.local.init`.
# Example of compiling a pipeline (no KFP backend needed for this step)
# compiler = kfp.compiler.Compiler()
# compiler.compile(pipeline_func=add_pipeline, package_path='add_pipeline.yaml')
# Example of running a pipeline against a KFP endpoint
# client = kfp.Client(host=os.environ.get('KFP_HOST', 'http://localhost:8080'))
# run = client.create_run_from_pipeline_func(
# add_pipeline,
# arguments={'a': 7.0, 'b': 8.0}
# )
# print(f"Pipeline run initiated: {run.url}")
print("Pipeline 'add_pipeline' defined successfully. To run, compile and submit to a KFP backend.")