MLflow Skinny
MLflow Skinny is a lightweight Python package that provides core MLflow functionalities for experiment tracking and model management, omitting heavier dependencies like SQL storage, the MLflow UI, server, and extensive data science libraries. It serves as a foundation for users who need only the tracking and logging capabilities. MLflow is an open-source platform designed to streamline the entire machine learning lifecycle, supporting experiment tracking, reproducible code packaging, and model deployment. The current version is 3.10.1 and it requires Python >=3.10. The library maintains an active development status with frequent patch and minor releases, often on a monthly cadence.
Warnings
- breaking MLflow 3.x changed the default tracking URI from file-based (./mlruns) to SQLite (sqlite:///mlflow.db). `mlflow-skinny` does not include `sqlalchemy`, `alembic`, or `sqlparse` by default, leading to `UnsupportedModelRegistryStoreURIException` if a tracking URI is not explicitly set or these dependencies are not manually installed.
- gotcha `mlflow-skinny` explicitly excludes the MLflow UI and server components. Running `mlflow ui` after installing only `mlflow-skinny` will result in an error like 'Unable to display MLflow UI - landing page (index.html) not found'.
- gotcha Many MLflow features, particularly model flavors (e.g., `mlflow.sklearn`, `mlflow.tensorflow`), model serving (`mlflow models serve`), or advanced artifact storage, require additional dependencies not bundled with `mlflow-skinny`. Trying to use these features without the necessary extra packages will lead to `ImportError` or `ModuleNotFoundError`.
- breaking MLflow 3.x introduced several breaking changes. For example, the `run_uuid` attribute on `RunInfo` objects was removed and replaced by `run_id`. Some model flavors (e.g., `fastai`, `mleap`, `diviner`, `promptflow`) were deprecated or removed. The `log_model` API can now be called directly without `mlflow.start_run()` context.
Install
-
pip install mlflow-skinny -
pip install mlflow-skinny scikit-learn numpy pandas -
pip install mlflow-skinny sqlalchemy alembic sqlparse
Imports
- mlflow
import mlflow
- mlflow.sklearn
import mlflow.sklearn
- MlflowClient
from mlflow import MlflowClient
Quickstart
import os
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
# Ensure required dependencies are installed for this example (sklearn, numpy, pandas)
# pip install mlflow-skinny scikit-learn numpy pandas
# Set a tracking URI. With mlflow-skinny, a local file-based store (mlruns/) is often preferred
# or ensure `sqlalchemy` is installed for 'sqlite:///mlflow.db' default in MLflow 3.x.
# We use a local directory explicitly to avoid the default SQLite dependency issue.
mlflow.set_tracking_uri("file:///tmp/mlruns_quickstart")
# Enable MLflow's automatic experiment tracking for scikit-learn
# This will log parameters, metrics, and the model automatically
mlflow.sklearn.autolog()
# Load the training dataset
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
# Train a RandomForestRegressor model
# MLflow triggers logging automatically upon model fitting due to autologging
with mlflow.start_run():
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3, random_state=42)
rf.fit(X_train, y_train)
# You can also manually log additional metrics or parameters if needed
# mlflow.log_metric("example_custom_metric", 0.95)
print(f"MLflow Run completed. View runs at: {mlflow.get_tracking_uri()}")