transformers-cfg

raw JSON →
0.2.7 verified Sat May 09 auth: no python

Extension of the Hugging Face Transformers library for context-free grammar constrained decoding using EBNF grammars. Current version is 0.2.7, released in 2025. Active development with frequent releases.

pip install transformers-cfg
error cannot import name 'GrammarConstraint' from 'transformers_cfg'
cause GrammarConstraint is not exported at package level.
fix
Use: from transformers_cfg.grammar_utils import GrammarConstraint
error AttributeError: module 'transformers_cfg' has no attribute 'grammar_utils'
cause Old version of transformers-cfg (<0.2.0) did not have grammar_utils module.
fix
Upgrade to latest version: pip install --upgrade transformers-cfg
error Expected all tensors to be on the same device, but found at least two devices
cause Tokenizer and model are on different devices (e.g., tokenizer on CPU, model on GPU). GrammarConstraint uses tokenizer to create tensors that must match model's device.
fix
Move tokenizer to model device: tokenizer.model.to(model.device) or pass device_map to model.
breaking In v0.2.0, the API was restructured: GrammarConstraint moved from top-level to grammar_utils module and IncrementalGrammarConstraint was added. Older imports will break.
fix Update imports to from transformers_cfg.grammar_utils import GrammarConstraint
gotcha GrammarConstraint requires the tokenizer object at initialization; if you pass a tokenizer that is not from the same model, parsing may silently fail or produce incorrect masks.
fix Always use the same tokenizer that corresponds to the model you are generating with.
gotcha The package reserves special tokens like <|endoftext|> for internal use. If your grammar expects those tokens, use escape sequences or avoid them.
fix Design grammars without referencing internal special tokens.
deprecated The old grammar file format using .gbnf is deprecated; use .ebnf instead. The CLI still supports .gbnf but may be removed in a future version.
fix Convert .gbnf files to .ebnf or use the string-based API.

Basic constrained generation with a JSON grammar.

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers_cfg.grammar_utils import GrammarConstraint
from transformers_cfg.generation.logits_process import GrammarLogitsProcessor

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Load a grammar (e.g., from a string)
import os
grammar_str = open(os.path.join(os.path.dirname(__file__), "grammars", "json.ebnf"), "r").read()

grammar = GrammarConstraint(grammar_str, tokenizer=tokenizer)
logits_processor = GrammarLogitsProcessor(grammar)

inputs = tokenizer(["Here is a JSON: "], return_tensors="pt")
output = model.generate(
    **inputs,
    max_new_tokens=100,
    logits_processor=[logits_processor],
    pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(output[0], skip_special_tokens=True))