OpenTelemetry Scikit-learn Instrumentation

0.46b0 · active · verified Thu Apr 16

This library provides OpenTelemetry automatic instrumentation for the scikit-learn (sklearn) machine learning library. It enables the collection of telemetry data, such as traces and spans, for various scikit-learn operations like model training (`fit`) and prediction (`predict`). The project is actively maintained as part of the broader OpenTelemetry Python Contrib repository, with new versions released regularly as beta releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to instrument scikit-learn operations. It sets up a basic OpenTelemetry ConsoleSpanExporter to print traces to the console, initializes the `SklearnInstrumentor`, and then performs typical scikit-learn `fit` and `predict` operations. You should see spans generated for these activities in your console output.

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from opentelemetry.instrumentation.sklearn import SklearnInstrumentor

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Configure OpenTelemetry Tracer
resource = Resource.create({"service.name": "sklearn-app"})
provider = TracerProvider(resource=resource)
processor = SimpleSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Initialize Sklearn Instrumentation
# Ensure this is called BEFORE importing sklearn if using programmatic instrumentation
SklearnInstrumentor().instrument()

# Scikit-learn operations will now be traced
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)

print("\n--- Training Model ---")
model.fit(X_train, y_train)
print("Model training complete.")

print("\n--- Making Predictions ---")
predictions = model.predict(X_test)
print("Predictions made.")

view raw JSON →