MLeap Python API

0.24.0 · active · verified Sat Apr 11

MLeap is a serialization format and a runtime for machine learning pipelines. It allows you to train models using Apache Spark, Scikit-learn, or XGBoost, and then serialize them into a portable format that can be served in real-time without Spark dependencies. The Python API, currently at version 0.24.0, provides tools for training, exporting, and running these pipelines. It supports Python 3.9+, Scala 2.13, Spark 4.0.1, and Java 17. Releases are semi-regular, often driven by upstream library updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to train a simple scikit-learn `LinearRegression` model, wrap it in an `MLeapPipeline`, export it to an MLeap bundle file, and then load and use it for predictions. Ensure a compatible JDK is installed and `JAVA_HOME` is configured for the runtime part of MLeap.

import pandas as pd
import numpy as np
import os
import shutil
from sklearn.linear_model import LinearRegression
from mleap.sklearn.pipeline import MLeapPipeline
from mleap.bundle import Bundle

# 1. Prepare sample data
data = {
    'feature1': np.random.rand(10),
    'feature2': np.random.rand(10)
}
df = pd.DataFrame(data)
target = np.random.rand(10)

# 2. Train a scikit-learn model
model_sklearn = LinearRegression()
model_sklearn.fit(df[['feature1', 'feature2']], target)

# 3. Wrap the scikit-learn model in an MLeapPipeline
mleap_pipeline = MLeapPipeline([
    ('lr', model_sklearn)
])

# 4. Define export path and clean up previous exports
bundle_path = "/tmp/my_linear_regression_mleap.zip"
model_name = "linear_regression_mleap_model"
if os.path.exists(bundle_path):
    os.remove(bundle_path)
if os.path.exists(f"/tmp/{model_name}"):
    shutil.rmtree(f"/tmp/{model_name}")

# 5. Export the MLeap pipeline to a bundle file
with Bundle().writer(mleap_pipeline, df[['feature1', 'feature2']], name=model_name) as writer:
    writer.serialize_to_zip(bundle_path)

print(f"MLeap model exported to: {bundle_path}")

# 6. Load the MLeap bundle back into memory
loaded_bundle = Bundle.load_model(bundle_path)

# 7. Make predictions with the loaded model
test_data = pd.DataFrame([[0.1, 0.9]], columns=['feature1', 'feature2'])
predictions = loaded_bundle.predict(test_data)
print(f"Predictions: {predictions}")

# 8. Clean up created files (optional)
if os.path.exists(bundle_path):
    os.remove(bundle_path)
if os.path.exists(f"/tmp/{model_name}"):
    shutil.rmtree(f"/tmp/{model_name}")

view raw JSON →