{"id":2998,"library":"mleap","title":"MLeap Python API","description":"MLeap is a serialization format and a runtime for machine learning pipelines. It allows you to train models using Apache Spark, Scikit-learn, or XGBoost, and then serialize them into a portable format that can be served in real-time without Spark dependencies. The Python API, currently at version 0.24.0, provides tools for training, exporting, and running these pipelines. It supports Python 3.9+, Scala 2.13, Spark 4.0.1, and Java 17. Releases are semi-regular, often driven by upstream library updates.","status":"active","version":"0.24.0","language":"en","source_language":"en","source_url":"https://github.com/combust/mleap/tree/master/python","tags":["machine-learning","model-serving","spark","serialization","onnx","sklearn"],"install":[{"cmd":"pip install mleap","lang":"bash","label":"Base installation"},{"cmd":"pip install mleap[spark]","lang":"bash","label":"With PySpark support"},{"cmd":"pip install mleap[onnx]","lang":"bash","label":"With ONNX runtime support"}],"dependencies":[{"reason":"Core numerical operations and data handling.","package":"numpy"},{"reason":"Integration with scikit-learn models for export/import.","package":"scikit-learn"},{"reason":"Scientific computing dependency for scikit-learn.","package":"scipy"},{"reason":"Commonly used for DataFrame input/output in examples and real-world usage.","package":"pandas","optional":true},{"reason":"Required for Spark integration features, available via `mleap[spark]` extra.","package":"pyspark","optional":true},{"reason":"Required for ONNX model support, available via `mleap[onnx]` extra.","package":"onnxruntime","optional":true},{"reason":"MLeap relies on a JVM for its runtime. Version 0.24.0 requires JDK 17. Must be installed separately and `JAVA_HOME` configured.","package":"Java Development Kit (JDK)","optional":false}],"imports":[{"note":"MLeapPipeline for scikit-learn integration is nested under `mleap.sklearn.pipeline`.","wrong":"from mleap.pipeline import MLeapPipeline","symbol":"MLeapPipeline","correct":"from mleap.sklearn.pipeline import MLeapPipeline"},{"symbol":"Bundle","correct":"from mleap.bundle import Bundle"},{"note":"Used for managing the MLeap runtime environment.","symbol":"MLeapContext","correct":"from mleap.runtime import MLeapContext"}],"quickstart":{"code":"import pandas as pd\nimport numpy as np\nimport os\nimport shutil\nfrom sklearn.linear_model import LinearRegression\nfrom mleap.sklearn.pipeline import MLeapPipeline\nfrom mleap.bundle import Bundle\n\n# 1. Prepare sample data\ndata = {\n    'feature1': np.random.rand(10),\n    'feature2': np.random.rand(10)\n}\ndf = pd.DataFrame(data)\ntarget = np.random.rand(10)\n\n# 2. Train a scikit-learn model\nmodel_sklearn = LinearRegression()\nmodel_sklearn.fit(df[['feature1', 'feature2']], target)\n\n# 3. Wrap the scikit-learn model in an MLeapPipeline\nmleap_pipeline = MLeapPipeline([\n    ('lr', model_sklearn)\n])\n\n# 4. Define export path and clean up previous exports\nbundle_path = \"/tmp/my_linear_regression_mleap.zip\"\nmodel_name = \"linear_regression_mleap_model\"\nif os.path.exists(bundle_path):\n    os.remove(bundle_path)\nif os.path.exists(f\"/tmp/{model_name}\"):\n    shutil.rmtree(f\"/tmp/{model_name}\")\n\n# 5. Export the MLeap pipeline to a bundle file\nwith Bundle().writer(mleap_pipeline, df[['feature1', 'feature2']], name=model_name) as writer:\n    writer.serialize_to_zip(bundle_path)\n\nprint(f\"MLeap model exported to: {bundle_path}\")\n\n# 6. Load the MLeap bundle back into memory\nloaded_bundle = Bundle.load_model(bundle_path)\n\n# 7. Make predictions with the loaded model\ntest_data = pd.DataFrame([[0.1, 0.9]], columns=['feature1', 'feature2'])\npredictions = loaded_bundle.predict(test_data)\nprint(f\"Predictions: {predictions}\")\n\n# 8. Clean up created files (optional)\nif os.path.exists(bundle_path):\n    os.remove(bundle_path)\nif os.path.exists(f\"/tmp/{model_name}\"):\n    shutil.rmtree(f\"/tmp/{model_name}\")","lang":"python","description":"This quickstart demonstrates how to train a simple scikit-learn `LinearRegression` model, wrap it in an `MLeapPipeline`, export it to an MLeap bundle file, and then load and use it for predictions. Ensure a compatible JDK is installed and `JAVA_HOME` is configured for the runtime part of MLeap."},"warnings":[{"fix":"Check MLeap's release notes for the required Java Development Kit (JDK), Apache Spark, and other library versions. Update your JVM and associated dependencies accordingly to match the MLeap version you are using.","message":"MLeap versions often align with major upgrades of underlying platforms (Java, Spark, XGBoost, TensorFlow). For example, v0.24.0 requires Java 17, Spark 4.0.1, and XGBoost 2.0.3, a significant change from prior versions (e.g., v0.22.0 supported Spark 3.3.0). This can cause compatibility issues if your runtime environment does not match the version MLeap was built against.","severity":"breaking","affected_versions":"0.24.0 onwards (from previous versions)"},{"fix":"Install the correct Java Development Kit (JDK) version (e.g., OpenJDK 17). Ensure the `JAVA_HOME` environment variable is set correctly and points to your JDK installation. For Spark integration, ensure `SPARK_HOME` and `PYSPARK_SUBMIT_ARGS` are also configured correctly.","message":"MLeap fundamentally relies on a Java Virtual Machine (JVM) for its runtime, and the Python API interacts with this JVM via Py4J. Common issues include not having a compatible JVM installed (e.g., Java 17 for v0.24.0) or incorrect `JAVA_HOME` / classpath configuration, leading to `NoClassDefFoundError`, `JVM not found`, or `Py4JError` errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use the same major.minor version of the `mleap` Python package and the `mleap-runtime` Scala library. When upgrading MLeap, it is highly recommended to re-export all existing models with the new version to ensure compatibility.","message":"Models exported using one MLeap Python API version might not be compatible with an MLeap Scala/JVM runtime of a different version (and vice-versa). Serialization formats can change between releases, leading to load/prediction failures with `UnsupportedBundleFileVersionException` or similar when attempting to run models with mismatched versions.","severity":"gotcha","affected_versions":"All major/minor version upgrades"},{"fix":"Ensure the sample `pd.DataFrame` (or `pyspark.sql.DataFrame`) provided to `Bundle().writer` accurately reflects the exact structure (column names, data types, and order) of the data your model expects during inference. For pipelines, ensure all input features are represented in the sample data.","message":"When exporting models, MLeap requires a sample DataFrame (or similar structure) to accurately infer the input schema of the model. Providing incorrect or incomplete input data during the `Bundle().writer` step can lead to models that fail to load or predict correctly at runtime due to schema mismatches, leading to runtime errors or unexpected behavior.","severity":"gotcha","affected_versions":"All versions using `Bundle().writer`"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}