Kedro-MLflow

raw JSON →
2.0.2 verified Mon Apr 27 auth: no python

A Kedro plugin that integrates MLflow for experiment tracking, model registry, and pipeline logging. Version 2.0.2 supports Kedro >=1.0.0 and MLflow >=3.0.0 (dropped support for MLflow 2.x). Released roughly every few months.

pip install kedro-mlflow
error ModuleNotFoundError: No module named 'kedro_mlflow'
cause kedro-mlflow not installed in the current environment.
fix
pip install kedro-mlflow
error ImportError: cannot import name 'MlflowModelTrackingDataset' from 'kedro_mlflow.io'
cause The correct import path is from kedro_mlflow.io.models.
fix
Use from kedro_mlflow.io.models import MlflowModelTrackingDataset
error mlflow.exceptions.MlflowException: Unsupported model URI scheme
cause Using an incorrect model URI format in load_args in MLflow 3.x.
fix
Use load_args={"model_uri": "models:/<model_name>/<version>"} instead of run_id.
breaking v2.0.0 dropped support for MLflow <3.0.0 and Kedro <1.0.0. If upgrading from v1.x, you must upgrade both Kedro and MLflow.
fix Update Kedro to >=1.0.0 and MLflow to >=3.0.0. See migration guide.
breaking v2.0.0 removed the `run_id` argument from `MlflowModelTrackingDataset`. Use `load_args={"model_uri": "models:/<model_name>/<version>"}` instead.
fix Replace `run_id` in dataset instantiation with `load_args` containing `model_uri`.
gotcha On Databricks, autologging is enabled by default and conflicts with kedro-mlflow. You must disable autologging in mlflow.yml.
fix Set `tracking.disable_tracking.disable_autologging: true` in mlflow.yml.
gotcha MLflow thread-safety can cause tracking to be lost if nodes run in parallel. The plugin reopens the run before each node, but custom logging outside nodes may be lost.
fix Ensure all MLflow logging happens inside Kedro node functions or callbacks that the plugin manages.

Basic usage: configure Kedro project, create a Kedro session, and save a model using MlflowModelTrackingDataset.

from pathlib import Path
from kedro.framework.project import configure_project
configure_project(Path.cwd().name)
from kedro.framework.session import KedroSession
from kedro_mlflow.io.models import MlflowModelTrackingDataset
import mlflow

with KedroSession.create() as session:
    context = session.load_context()
    # Example: log a model with MlflowModelTrackingDataset
    data_set = MlflowModelTrackingDataset(
        filepath="model.pkl",
        flavor="mlflow.sklearn",
        model_name="test_model",
        save_args={"registered_model_name": "test_model"}
    )
    # simulate using the dataset
    import pandas as pd
    data = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
    data_set.save(data)
    print("Model saved to MLflow.")