{"id":5482,"library":"sklearn2pmml","title":"Scikit-Learn to PMML Converter","description":"sklearn2pmml is a Python library designed for converting Scikit-Learn pipelines and estimators into the Predictive Model Markup Language (PMML) format. It acts as a thin Python wrapper around the JPMML-SkLearn Java library, enabling the export of trained machine learning models for deployment in environments that support PMML. The current version is 0.130.0, released on April 4, 2026, and the library is actively maintained.","status":"active","version":"0.130.0","language":"en","source_language":"en","source_url":"https://github.com/jpmml/sklearn2pmml","tags":["scikit-learn","pmml","model export","machine learning","model deployment","interoperability"],"install":[{"cmd":"pip install sklearn2pmml","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"sklearn2pmml is a wrapper around a Java library (JPMML-SkLearn) and requires a compatible Java Runtime Environment (JRE) to be installed and available on the system path.","package":"Java 11 or newer","optional":false},{"reason":"Core functionality involves converting Scikit-Learn models and pipelines.","package":"scikit-learn","optional":false},{"reason":"Used for evaluating (scoring) PMML models in Python, often in conjunction with models generated by sklearn2pmml.","package":"jpmml-evaluator","optional":true}],"imports":[{"symbol":"PMMLPipeline","correct":"from sklearn2pmml.pipeline import PMMLPipeline"},{"note":"The `make_pmml_pipeline` utility function has been removed or deprecated in favor of directly using `PMMLPipeline` and `sklearn2pmml()`.","wrong":"from sklearn2pmml import make_pmml_pipeline","symbol":"sklearn2pmml","correct":"from sklearn2pmml import sklearn2pmml"}],"quickstart":{"code":"import pandas as pd\nfrom sklearn.datasets import load_iris\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn2pmml.pipeline import PMMLPipeline\nfrom sklearn2pmml import sklearn2pmml\n\n# Load a sample dataset\niris = load_iris(as_frame=True)\nX, y = iris.data, iris.target\n\n# Create a PMMLPipeline (extends sklearn.pipeline.Pipeline)\npmml_pipeline = PMMLPipeline([\n    ('classifier', DecisionTreeClassifier())\n])\n\n# Fit the pipeline\npmml_pipeline.fit(X, y)\n\n# Convert the fitted pipeline to PMML\npmml_filepath = 'DecisionTreeIris.pmml'\nsklearn2pmml(pmml_pipeline, pmml_filepath)\n\nprint(f\"PMML model successfully exported to {pmml_filepath}\")","lang":"python","description":"This quickstart demonstrates how to create a Scikit-Learn pipeline, wrap it with `PMMLPipeline`, fit the model, and then export it to a PMML file using the `sklearn2pmml()` function. The `PMMLPipeline` class enhances the standard Scikit-Learn pipeline with PMML-specific functionalities like capturing feature names from Pandas DataFrames."},"warnings":[{"fix":"Directly use `sklearn2pmml.pipeline.PMMLPipeline` for pipeline creation and pass `escape_func` (if needed) to `sklearn2pmml()`.","message":"The `sklearn2pmml.make_pmml_pipeline()` utility function has been removed. Additionally, the `escape_func` parameter was moved from `make_pmml_pipeline()` to the `sklearn2pmml()` function.","severity":"breaking","affected_versions":"0.120.0 and newer"},{"fix":"Ensure Java 11+ is installed and configured correctly on your system path.","message":"sklearn2pmml is a Python wrapper for a Java library and requires a Java Runtime Environment (JRE) version 11 or newer to be installed and accessible via the system's PATH environment variable. Without Java, conversion attempts will fail with a `RuntimeError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `jpmml-evaluator-python` or another PMML evaluation library if you need to score PMML files within Python.","message":"The `sklearn2pmml` library is designed for *exporting* Scikit-learn models to PMML. It does not provide functionality to *import* PMML files back into Scikit-learn objects or Python for native scoring. For Python-based PMML evaluation, consider using the `jpmml-evaluator-python` library.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Pass `X` and `y` as `pandas.DataFrame` and `pandas.Series` objects, respectively, to the `fit()` method of `PMMLPipeline`.","message":"When training models within `PMMLPipeline`, using `pandas.DataFrame` or `pandas.Series` for `X` and `y` is recommended. This allows `sklearn2pmml` to correctly capture and embed meaningful feature and target names in the PMML file. If NumPy arrays are used, feature names will default to generic 'x1', 'x2', etc., and the target name to 'y'.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refactor complex custom preprocessing steps into simpler, PMML-compatible transformers where possible, or perform them external to the PMML pipeline.","message":"Direct conversion of highly custom Python classes, especially for complex data preprocessing (e.g., advanced text feature extraction using third-party libraries), is generally not supported. PMML has a limited set of expressible transformations, and arbitrary Python code cannot be translated.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Increase the Java Virtual Machine (JVM) memory allocation by passing `java_opts` to the `sklearn2pmml()` function, e.g., `sklearn2pmml(pipeline, 'model.pmml', java_opts=['-Xms4096m', '-Xmx4096m'])` to allocate 4GB of heap space.","message":"Conversion of large or complex models can lead to out-of-memory errors in the underlying Java process. This often manifests as a `RuntimeError` with Java-related stack traces.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}