OpenTelemetry Scikit-learn Instrumentation
This library provides OpenTelemetry automatic instrumentation for the scikit-learn (sklearn) machine learning library. It enables the collection of telemetry data, such as traces and spans, for various scikit-learn operations like model training (`fit`) and prediction (`predict`). The project is actively maintained as part of the broader OpenTelemetry Python Contrib repository, with new versions released regularly as beta releases.
Common errors
-
ModuleNotFoundError: No module named 'sklearn'
cause The `opentelemetry-instrumentation-sklearn` package requires `scikit-learn` to be installed, but it is missing from the environment.fixInstall scikit-learn: `pip install scikit-learn` -
Sklearn operations are not being traced/no spans are generated.
cause The `SklearnInstrumentor().instrument()` call was made after `sklearn` modules or objects were already imported and initialized, or OpenTelemetry SDK was not properly configured.fixEnsure `SklearnInstrumentor().instrument()` is called as early as possible in your application's startup, ideally before any `import sklearn` statements. Also, verify that a `TracerProvider` and `SpanProcessor` are correctly configured and set as the global trace provider. -
ERROR: opentelemetry-instrumentation-sklearn 0.46b0 requires scikit-learn>=0.24.0,<1.4.0, but you have scikit-learn 1.4.1 which is incompatible.
cause A version conflict exists between the installed `scikit-learn` and the versions supported by `opentelemetry-instrumentation-sklearn`.fixAdjust your `scikit-learn` version to be within the compatible range specified by `opentelemetry-instrumentation-sklearn` (e.g., `pip install 'scikit-learn<1.4.0'` or `pip install 'scikit-learn>=0.24.0,<1.4.0'` for this example).
Warnings
- gotcha The OpenTelemetry instrumentation should be initialized before the `sklearn` library is imported to ensure proper monkey-patching and tracing of operations. Importing `sklearn` components before calling `SklearnInstrumentor().instrument()` may result in untraced operations.
- breaking A change in OpenTelemetry Python Contrib (around v0.53b0 / 1.32.0) altered how dependency checks are performed. Instrumentors now check for the instrumented library's presence and version *inside* the `instrument()` method. If the target library (scikit-learn in this case) is not installed, or its version is incompatible, `instrument()` may raise an `ImportError` or other exceptions.
- gotcha Running multiple OpenTelemetry SDK components (e.g., multiple exporters or processors) can lead to duplicate telemetry. This is especially problematic in environments like 'Always On' Azure Functions or applications using pre-fork servers where processes might persist or get duplicated.
Install
-
pip install opentelemetry-instrumentation-sklearn -
pip install 'opentelemetry-distro[otlp]' opentelemetry-instrumentation-sklearn
Imports
- SklearnInstrumentor
from opentelemetry.instrumentation.sklearn import SklearnInstrumentor
Quickstart
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from opentelemetry.instrumentation.sklearn import SklearnInstrumentor
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Configure OpenTelemetry Tracer
resource = Resource.create({"service.name": "sklearn-app"})
provider = TracerProvider(resource=resource)
processor = SimpleSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Initialize Sklearn Instrumentation
# Ensure this is called BEFORE importing sklearn if using programmatic instrumentation
SklearnInstrumentor().instrument()
# Scikit-learn operations will now be traced
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200)
print("\n--- Training Model ---")
model.fit(X_train, y_train)
print("Model training complete.")
print("\n--- Making Predictions ---")
predictions = model.predict(X_test)
print("Predictions made.")