Kedro-MLflow
raw JSON → 2.0.2 verified Mon Apr 27 auth: no python
A Kedro plugin that integrates MLflow for experiment tracking, model registry, and pipeline logging. Version 2.0.2 supports Kedro >=1.0.0 and MLflow >=3.0.0 (dropped support for MLflow 2.x). Released roughly every few months.
pip install kedro-mlflow Common errors
error ModuleNotFoundError: No module named 'kedro_mlflow' ↓
cause kedro-mlflow not installed in the current environment.
fix
pip install kedro-mlflow
error ImportError: cannot import name 'MlflowModelTrackingDataset' from 'kedro_mlflow.io' ↓
cause The correct import path is from kedro_mlflow.io.models.
fix
Use
from kedro_mlflow.io.models import MlflowModelTrackingDataset error mlflow.exceptions.MlflowException: Unsupported model URI scheme ↓
cause Using an incorrect model URI format in load_args in MLflow 3.x.
fix
Use
load_args={"model_uri": "models:/<model_name>/<version>"} instead of run_id. Warnings
breaking v2.0.0 dropped support for MLflow <3.0.0 and Kedro <1.0.0. If upgrading from v1.x, you must upgrade both Kedro and MLflow. ↓
fix Update Kedro to >=1.0.0 and MLflow to >=3.0.0. See migration guide.
breaking v2.0.0 removed the `run_id` argument from `MlflowModelTrackingDataset`. Use `load_args={"model_uri": "models:/<model_name>/<version>"}` instead. ↓
fix Replace `run_id` in dataset instantiation with `load_args` containing `model_uri`.
gotcha On Databricks, autologging is enabled by default and conflicts with kedro-mlflow. You must disable autologging in mlflow.yml. ↓
fix Set `tracking.disable_tracking.disable_autologging: true` in mlflow.yml.
gotcha MLflow thread-safety can cause tracking to be lost if nodes run in parallel. The plugin reopens the run before each node, but custom logging outside nodes may be lost. ↓
fix Ensure all MLflow logging happens inside Kedro node functions or callbacks that the plugin manages.
Imports
- mlflow.yml config
kedro mlflow init - MlflowModelTrackingDataset wrong
from kedro_mlflow.io import MlflowModelTrackingDatasetcorrectfrom kedro_mlflow.io.models import MlflowModelTrackingDataset
Quickstart
from pathlib import Path
from kedro.framework.project import configure_project
configure_project(Path.cwd().name)
from kedro.framework.session import KedroSession
from kedro_mlflow.io.models import MlflowModelTrackingDataset
import mlflow
with KedroSession.create() as session:
context = session.load_context()
# Example: log a model with MlflowModelTrackingDataset
data_set = MlflowModelTrackingDataset(
filepath="model.pkl",
flavor="mlflow.sklearn",
model_name="test_model",
save_args={"registered_model_name": "test_model"}
)
# simulate using the dataset
import pandas as pd
data = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
data_set.save(data)
print("Model saved to MLflow.")