python-crfsuite

0.9.12 · active · verified Thu Apr 09

python-crfsuite is a Python binding for CRFsuite, a fast implementation of Conditional Random Fields (CRFs) for labeling sequential data. It's widely used in Natural Language Processing (NLP) for tasks like Named Entity Recognition (NER), Part-of-Speech (POS) tagging, and other sequence labeling problems. The current version is 0.9.12, and releases primarily focus on Python version compatibility and stability.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to train a Conditional Random Field (CRF) model using `pycrfsuite.Trainer` and then use the trained model with `pycrfsuite.Tagger` to predict labels for new sequences. The example uses a simple list-of-lists format for features and labels, which is common for sequence labeling tasks.

import pycrfsuite
import os

# Sample data (features, labels)
X_train = [
    [['walk', 'big'], ['dog']],
    [['eat', 'apple'], ['red', 'apple']],
    [['run', 'fast'], ['cat']]
]
y_train = [
    ['VERB', 'NOUN'],
    ['VERB', 'NOUN'],
    ['VERB', 'NOUN']
]

# 1. Train a CRF model
trainer = pycrfsuite.Trainer(verbose=False)
for xseq, yseq in zip(X_train, y_train):
    trainer.append(xseq, yseq)

trainer.set_params({
    'c1': 1.0,   # coefficient for L1 penalty
    'c2': 1e-3,  # coefficient for L2 penalty
    'max_iterations': 50, # stop earlier
    'feature.possible_transitions': True
})

model_filename = 'model.crfsuite'
trainer.train(model_filename)

print(f"Model trained and saved to '{model_filename}'")

# 2. Use the trained model for tagging
tagger = pycrfsuite.Tagger()
tagger.open(model_filename)

X_test = [
    [['see', 'small'], ['dog']]
]

predicted_tags = [tagger.tag(xseq) for xseq in X_test]
print(f"Test sequence: {X_test}")
print(f"Predicted tags: {predicted_tags}")

# Clean up the model file
os.remove(model_filename)

view raw JSON →