MLflow Skinny

3.10.1 · active · verified Sun Mar 29

MLflow Skinny is a lightweight Python package that provides core MLflow functionalities for experiment tracking and model management, omitting heavier dependencies like SQL storage, the MLflow UI, server, and extensive data science libraries. It serves as a foundation for users who need only the tracking and logging capabilities. MLflow is an open-source platform designed to streamline the entire machine learning lifecycle, supporting experiment tracking, reproducible code packaging, and model deployment. The current version is 3.10.1 and it requires Python >=3.10. The library maintains an active development status with frequent patch and minor releases, often on a monthly cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `mlflow-skinny` for experiment tracking with scikit-learn's autologging feature. It logs a `RandomForestRegressor` model, its parameters, and metrics automatically. Note that for this example to run, `scikit-learn`, `numpy`, and `pandas` must be installed alongside `mlflow-skinny`. It explicitly sets a file-based tracking URI to avoid issues with MLflow 3.x's default SQLite backend, which `mlflow-skinny` doesn't support out-of-the-box.

import os
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

# Ensure required dependencies are installed for this example (sklearn, numpy, pandas)
# pip install mlflow-skinny scikit-learn numpy pandas

# Set a tracking URI. With mlflow-skinny, a local file-based store (mlruns/) is often preferred
# or ensure `sqlalchemy` is installed for 'sqlite:///mlflow.db' default in MLflow 3.x.
# We use a local directory explicitly to avoid the default SQLite dependency issue.
mlflow.set_tracking_uri("file:///tmp/mlruns_quickstart")

# Enable MLflow's automatic experiment tracking for scikit-learn
# This will log parameters, metrics, and the model automatically
mlflow.sklearn.autolog()

# Load the training dataset
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Train a RandomForestRegressor model
# MLflow triggers logging automatically upon model fitting due to autologging
with mlflow.start_run():
    rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3, random_state=42)
    rf.fit(X_train, y_train)

    # You can also manually log additional metrics or parameters if needed
    # mlflow.log_metric("example_custom_metric", 0.95)

print(f"MLflow Run completed. View runs at: {mlflow.get_tracking_uri()}")

view raw JSON →