Snorkel

raw JSON →
0.10.0 verified Sat May 09 auth: no python

A system for quickly generating training data with weak supervision. Uses labeling functions and probabilistic models to create noisy labels. Current version 0.10.0, supports Python >=3.11. Releases are infrequent; maintenance updates only.

pip install snorkel
error ValueError: Label model must be fit before calling predict.
cause Calling predict() without first calling fit() on LabelModel.
fix
Call label_model.fit(...) before predict().
error ModuleNotFoundError: No module named 'snorkel.labeling.model'
cause Installed snorkel version <0.9.4 where the model submodule did not exist.
fix
Upgrade snorkel: pip install --upgrade snorkel. Or import from snorkel.labeling import LabelModel if using older version.
breaking Snorkel 0.10.0 dropped support for Python <3.11. If your environment uses Python 3.10 or earlier, you must downgrade to snorkel <=0.9.9 or upgrade Python.
fix Use Python >=3.11, or install snorkel==0.9.9 if locked to older Python.
gotcha LabelModel.fit() requires n_epochs to be explicitly set; default is None and will raise error. Many users forget to set n_epochs.
fix Always specify n_epochs (e.g., 500) when calling fit().
deprecated LabelModel renamed from 'LabelModel' to 'LabelModel' (still same) but old path snorkel.labeling.LabelModel still works but discouraged.
fix Use from snorkel.labeling.model import LabelModel.

Basic Snorkel flow: define labeling functions, generate label matrix, train LabelModel.

from snorkel.labeling import labeling_function, LFAnalysis
from snorkel.labeling.model import LabelModel
import numpy as np

@labeling_function()
def lf_keyword(x):
    # Return 1 if 'keyword' in text else 0
    return 1 if 'keyword' in x else 0

# Example data
data = ['text with keyword', 'no match']
# Apply LFs (simulate L matrix)
L = np.array([[1, 0], [0, 0])
# Train LabelModel
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train=L, n_epochs=500, log_freq=100, seed=123)
predictions = label_model.predict(L=L)