Scikit-Learn to PMML Converter
sklearn2pmml is a Python library designed for converting Scikit-Learn pipelines and estimators into the Predictive Model Markup Language (PMML) format. It acts as a thin Python wrapper around the JPMML-SkLearn Java library, enabling the export of trained machine learning models for deployment in environments that support PMML. The current version is 0.130.0, released on April 4, 2026, and the library is actively maintained.
Warnings
- breaking The `sklearn2pmml.make_pmml_pipeline()` utility function has been removed. Additionally, the `escape_func` parameter was moved from `make_pmml_pipeline()` to the `sklearn2pmml()` function.
- gotcha sklearn2pmml is a Python wrapper for a Java library and requires a Java Runtime Environment (JRE) version 11 or newer to be installed and accessible via the system's PATH environment variable. Without Java, conversion attempts will fail with a `RuntimeError`.
- gotcha The `sklearn2pmml` library is designed for *exporting* Scikit-learn models to PMML. It does not provide functionality to *import* PMML files back into Scikit-learn objects or Python for native scoring. For Python-based PMML evaluation, consider using the `jpmml-evaluator-python` library.
- gotcha When training models within `PMMLPipeline`, using `pandas.DataFrame` or `pandas.Series` for `X` and `y` is recommended. This allows `sklearn2pmml` to correctly capture and embed meaningful feature and target names in the PMML file. If NumPy arrays are used, feature names will default to generic 'x1', 'x2', etc., and the target name to 'y'.
- gotcha Direct conversion of highly custom Python classes, especially for complex data preprocessing (e.g., advanced text feature extraction using third-party libraries), is generally not supported. PMML has a limited set of expressible transformations, and arbitrary Python code cannot be translated.
- gotcha Conversion of large or complex models can lead to out-of-memory errors in the underlying Java process. This often manifests as a `RuntimeError` with Java-related stack traces.
Install
-
pip install sklearn2pmml
Imports
- PMMLPipeline
from sklearn2pmml.pipeline import PMMLPipeline
- sklearn2pmml
from sklearn2pmml import sklearn2pmml
Quickstart
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml
# Load a sample dataset
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
# Create a PMMLPipeline (extends sklearn.pipeline.Pipeline)
pmml_pipeline = PMMLPipeline([
('classifier', DecisionTreeClassifier())
])
# Fit the pipeline
pmml_pipeline.fit(X, y)
# Convert the fitted pipeline to PMML
pmml_filepath = 'DecisionTreeIris.pmml'
sklearn2pmml(pmml_pipeline, pmml_filepath)
print(f"PMML model successfully exported to {pmml_filepath}")