Low-level Guidance (llguidance) Python Bindings
llguidance is a high-performance Rust library with Python bindings for constrained decoding (structured outputs) in Large Language Models (LLMs). It enables enforcing arbitrary context-free grammars (including JSON schemas and regular expressions) on LLM outputs with minimal overhead, typically around 50μs of CPU time per token. It serves as the fast grammar engine backend for the `guidance` Python library. The current Python binding version is 1.7.0, and releases generally align with updates to the core Rust library.
Warnings
- gotcha Direct usage of `llguidance` Python bindings is low-level and requires manual integration with an LLM's tokenizer and inference loop. Unlike the `guidance` library, which orchestrates this, `llguidance` provides the core grammar engine and expects token IDs and masks.
- deprecated The internal (JSON-based) grammar format used by `llguidance` is slowly being deprecated in favor of a Lark-like format. While the internal format is still supported, new grammars should prefer the Lark-like syntax.
- gotcha Performance of `compute_mask()` can vary, especially with large tokenizers or complex grammars. While optimized, it may take over 1ms in some cases. It's recommended to run mask computation in a background thread to avoid blocking the main inference loop, particularly when operating on GPUs.
- breaking Updates to `llguidance` (e.g., 1.6.1) that introduce new features or performance improvements can indirectly affect users of `guidance` or other integrated libraries. While `llguidance` itself aims for stability, its underlying logic changes might alter subtle behaviors for consumers.
Install
-
pip install llguidance
Imports
- Constraint
from llguidance import Constraint
- TokenParser
from llguidance import TokenParser
- ParserFactory
from llguidance import ParserFactory
Quickstart
import os
from llguidance import ParserFactory, TokenParser, Constraint
from tokenizers import Tokenizer
# --- 1. Load a tokenizer (using a simple example for demonstration) ---
# In a real scenario, you'd load a tokenizer from a model, e.g., using transformers.
# For this example, we create a dummy tokenizer for basic ASCII characters.
# This is a placeholder; actual integration requires a proper LLM tokenizer.
tokenizer_json = {
"version": "1.0",
"truncation": null,
"added_tokens": [],
"normalizer": {"type": "Lowercase"},
"pre_tokenizer": {"type": "Whitespace"},
"post_processor": null,
"decoder": {"type": "WordPiece"},
"model": {"type": "WordPiece", "vocab": {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25, "[UNK]": 26}, "unk_token": "[UNK]"}
}
try:
# Attempt to load a real tokenizer for better functionality if tokenizers library is full featured
# For a proper quickstart, one might use a transformers tokenizer:
# from transformers import AutoTokenizer
# os.environ['HF_TOKEN'] = os.environ.get('HF_TOKEN', 'YOUR_HF_TOKEN') # For authenticated access if needed
# hf_tokenizer = AutoTokenizer.from_pretrained("gpt2") # Example: GPT-2 tokenizer
# Create a simple mapping to integer IDs for llguidance
# def tokenize_func(text: str) -> list[int]:
# return hf_tokenizer.encode(text)
# def token_id_to_string_func(token_id: int) -> str:
# return hf_tokenizer.decode([token_id])
# # This part needs careful adaptation to the actual llguidance Python API for tokenizer binding
# # As direct, simple `llguidance.new_tokenizer` is not explicitly documented for Python bindings
# # We fall back to a dummy for pure `llguidance` quickstart.
# raise NotImplementedError("Complex tokenizer integration needs more explicit llguidance Python API details.")
# Using a simple dummy tokenizer for demonstration purposes
dummy_tokenizer = Tokenizer.from_str(str(tokenizer_json))
vocab_map = {dummy_tokenizer.token_to_id(t): t for t in dummy_tokenizer.get_vocab()}
# The actual llguidance Python API for creating a tokenizer from an external one is not directly exposed
# in a simple, documented manner. We simulate it for this quickstart.
# In a real application, this integration point would need specific llguidance API calls.
# For this example, we'll manually feed token masks. This is a placeholder for direct llguidance usage.
# The below represents how llguidance *would* interface if a `Tokenizer` object existed directly.
# Placeholder for a function that converts text to a list of token IDs
# and another that converts token IDs back to text.
def get_token_id(char_token: str) -> int:
return dummy_tokenizer.token_to_id(char_token.lower())
def get_token_string(token_id: int) -> str:
return vocab_map.get(token_id, '[UNK]')
# --- 2. Define a grammar (JSON schema for a simple object) ---
json_grammar = '''
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0, "maximum": 150}
},
"required": ["name", "age"]
}
'''
# --- 3. Initialize ParserFactory and TokenParser ---
# This part is highly speculative for direct llguidance Python use outside 'guidance' library.
# The Rust docs suggest ParserFactory, TokenParser, and Constraint.
# Assuming direct Python bindings expose similar interfaces.
# In a real setup, `ParserFactory` would be initialized with tokenizer details.
# Since direct `llguidance` Python bindings for custom tokenizers are not clearly documented,
# and `guidance` abstracts this, this quickstart focuses on the *concept*.
# We'll use a simplified flow, acknowledging the current documentation gap for *direct* Python usage.
# This part would require a robust `llguidance.Tokenizer` equivalent or a way to pass `tokenizers` objects.
# As there's no clear 'from llguidance import Tokenizer', we need to fake the tokenizer part.
# This example focuses on demonstrating the *loop* for constraint checking,
# assuming a compatible Tokenizer setup could be achieved.
# A working quickstart for llguidance requires an actual LLM and its tokenizer.
# This example demonstrates the *logic* of constraint application, but lacks a live LLM.
# A more realistic approach would use `guidance` library directly, which orchestrates llguidance.
print("Direct llguidance Python usage is low-level and often abstracted by libraries like 'guidance'.")
print("A full runnable quickstart requires a live LLM and its specific tokenizer integration.")
print("The concepts of ParserFactory, TokenParser, and Constraint are fundamental.")
# Example of what the loop *would* look like conceptually with a hypothetical `llguidance` tokenizer API:
# tokenizer_for_llguidance = llguidance.Tokenizer.from_huggingface(hf_tokenizer)
# parser_factory = ParserFactory(tokenizer_for_llguidance)
# token_parser = parser_factory.create_parser(json_grammar)
# constraint = Constraint(token_parser)
# output_tokens = []
# while True:
# mask = constraint.compute_mask() # Get valid next tokens
# # In a real scenario, you'd feed this mask to your LLM's logits processor
# # and get the next token ID generated by the LLM.
# # For this dummy, let's just pick a valid token (e.g., first one).
# if not mask: # No more valid tokens or stop condition
# break
#
# next_token_id = next(iter(mask)) # Pick first allowed token for demo
# commit_result = constraint.commit_token(next_token_id)
# output_tokens.extend(commit_result.added_tokens)
# if commit_result.is_stop:
# break
# print("Generated (conceptually):", ''.join(token_id_to_string_func(t) for t in output_tokens))
print("For a practical example, see the `guidance` library, which builds on llguidance.")
print("https://github.com/guidance-ai/guidance")
except NotImplementedError as e:
print(f"Skipping full quickstart due to missing specific llguidance Python API for tokenizer integration: {e}")
print("llguidance is primarily a backend library; direct low-level Python usage often requires deep integration with LLM internals.")