{"id":5652,"library":"llguidance","title":"Low-level Guidance (llguidance) Python Bindings","description":"llguidance is a high-performance Rust library with Python bindings for constrained decoding (structured outputs) in Large Language Models (LLMs). It enables enforcing arbitrary context-free grammars (including JSON schemas and regular expressions) on LLM outputs with minimal overhead, typically around 50μs of CPU time per token. It serves as the fast grammar engine backend for the `guidance` Python library. The current Python binding version is 1.7.0, and releases generally align with updates to the core Rust library.","status":"active","version":"1.7.0","language":"en","source_language":"en","source_url":"https://github.com/guidance-ai/llguidance","tags":["LLM","constrained decoding","structured output","grammar","JSON schema","Rust","Python bindings"],"install":[{"cmd":"pip install llguidance","lang":"bash","label":"Install llguidance"}],"dependencies":[{"reason":"Required for tokenizing text and providing token IDs to llguidance's core components for mask computation.","package":"tokenizers","optional":false},{"reason":"Commonly used for loading pre-trained tokenizers compatible with LLMs, which are then adapted for llguidance. Not a direct dependency but often used in practice.","package":"transformers","optional":true}],"imports":[{"note":"Main entry point for driving constrained generation sessions.","symbol":"Constraint","correct":"from llguidance import Constraint"},{"note":"Manages a single constrained-generation session, built by a ParserFactory.","symbol":"TokenParser","correct":"from llguidance import TokenParser"},{"note":"Compiles grammars and holds shared tokenizer state.","symbol":"ParserFactory","correct":"from llguidance import ParserFactory"}],"quickstart":{"code":"import os\nfrom llguidance import ParserFactory, TokenParser, Constraint\nfrom tokenizers import Tokenizer\n\n# --- 1. Load a tokenizer (using a simple example for demonstration) ---\n# In a real scenario, you'd load a tokenizer from a model, e.g., using transformers.\n# For this example, we create a dummy tokenizer for basic ASCII characters.\n# This is a placeholder; actual integration requires a proper LLM tokenizer.\ntokenizer_json = {\n    \"version\": \"1.0\",\n    \"truncation\": null,\n    \"added_tokens\": [],\n    \"normalizer\": {\"type\": \"Lowercase\"},\n    \"pre_tokenizer\": {\"type\": \"Whitespace\"},\n    \"post_processor\": null,\n    \"decoder\": {\"type\": \"WordPiece\"},\n    \"model\": {\"type\": \"WordPiece\", \"vocab\": {\"a\": 0, \"b\": 1, \"c\": 2, \"d\": 3, \"e\": 4, \"f\": 5, \"g\": 6, \"h\": 7, \"i\": 8, \"j\": 9, \"k\": 10, \"l\": 11, \"m\": 12, \"n\": 13, \"o\": 14, \"p\": 15, \"q\": 16, \"r\": 17, \"s\": 18, \"t\": 19, \"u\": 20, \"v\": 21, \"w\": 22, \"x\": 23, \"y\": 24, \"z\": 25, \"[UNK]\": 26}, \"unk_token\": \"[UNK]\"}\n}\n\ntry:\n    # Attempt to load a real tokenizer for better functionality if tokenizers library is full featured\n    # For a proper quickstart, one might use a transformers tokenizer:\n    # from transformers import AutoTokenizer\n    # os.environ['HF_TOKEN'] = os.environ.get('HF_TOKEN', 'YOUR_HF_TOKEN') # For authenticated access if needed\n    # hf_tokenizer = AutoTokenizer.from_pretrained(\"gpt2\") # Example: GPT-2 tokenizer\n    # Create a simple mapping to integer IDs for llguidance\n    # def tokenize_func(text: str) -> list[int]:\n    #     return hf_tokenizer.encode(text)\n    # def token_id_to_string_func(token_id: int) -> str:\n    #     return hf_tokenizer.decode([token_id])\n    # # This part needs careful adaptation to the actual llguidance Python API for tokenizer binding\n    # # As direct, simple `llguidance.new_tokenizer` is not explicitly documented for Python bindings\n    # # We fall back to a dummy for pure `llguidance` quickstart.\n    # raise NotImplementedError(\"Complex tokenizer integration needs more explicit llguidance Python API details.\")\n    \n    # Using a simple dummy tokenizer for demonstration purposes\n    dummy_tokenizer = Tokenizer.from_str(str(tokenizer_json))\n    vocab_map = {dummy_tokenizer.token_to_id(t): t for t in dummy_tokenizer.get_vocab()}\n    \n    # The actual llguidance Python API for creating a tokenizer from an external one is not directly exposed\n    # in a simple, documented manner. We simulate it for this quickstart.\n    # In a real application, this integration point would need specific llguidance API calls.\n    # For this example, we'll manually feed token masks. This is a placeholder for direct llguidance usage.\n    # The below represents how llguidance *would* interface if a `Tokenizer` object existed directly.\n    \n    # Placeholder for a function that converts text to a list of token IDs\n    # and another that converts token IDs back to text.\n    def get_token_id(char_token: str) -> int:\n        return dummy_tokenizer.token_to_id(char_token.lower())\n\n    def get_token_string(token_id: int) -> str:\n        return vocab_map.get(token_id, '[UNK]')\n\n    # --- 2. Define a grammar (JSON schema for a simple object) ---\n    json_grammar = '''\n    {\n        \"type\": \"object\",\n        \"properties\": {\n            \"name\": {\"type\": \"string\"},\n            \"age\": {\"type\": \"integer\", \"minimum\": 0, \"maximum\": 150}\n        },\n        \"required\": [\"name\", \"age\"]\n    }\n    '''\n\n    # --- 3. Initialize ParserFactory and TokenParser ---\n    # This part is highly speculative for direct llguidance Python use outside 'guidance' library.\n    # The Rust docs suggest ParserFactory, TokenParser, and Constraint.\n    # Assuming direct Python bindings expose similar interfaces.\n\n    # In a real setup, `ParserFactory` would be initialized with tokenizer details.\n    # Since direct `llguidance` Python bindings for custom tokenizers are not clearly documented,\n    # and `guidance` abstracts this, this quickstart focuses on the *concept*.\n    # We'll use a simplified flow, acknowledging the current documentation gap for *direct* Python usage.\n\n    # This part would require a robust `llguidance.Tokenizer` equivalent or a way to pass `tokenizers` objects.\n    # As there's no clear 'from llguidance import Tokenizer', we need to fake the tokenizer part.\n    \n    # This example focuses on demonstrating the *loop* for constraint checking,\n    # assuming a compatible Tokenizer setup could be achieved.\n    \n    # A working quickstart for llguidance requires an actual LLM and its tokenizer.\n    # This example demonstrates the *logic* of constraint application, but lacks a live LLM.\n    \n    # A more realistic approach would use `guidance` library directly, which orchestrates llguidance.\n    \n    print(\"Direct llguidance Python usage is low-level and often abstracted by libraries like 'guidance'.\")\n    print(\"A full runnable quickstart requires a live LLM and its specific tokenizer integration.\")\n    print(\"The concepts of ParserFactory, TokenParser, and Constraint are fundamental.\")\n\n    # Example of what the loop *would* look like conceptually with a hypothetical `llguidance` tokenizer API:\n    # tokenizer_for_llguidance = llguidance.Tokenizer.from_huggingface(hf_tokenizer)\n    # parser_factory = ParserFactory(tokenizer_for_llguidance)\n    # token_parser = parser_factory.create_parser(json_grammar)\n    # constraint = Constraint(token_parser)\n\n    # output_tokens = []\n    # while True:\n    #     mask = constraint.compute_mask() # Get valid next tokens\n    #     # In a real scenario, you'd feed this mask to your LLM's logits processor\n    #     # and get the next token ID generated by the LLM.\n    #     # For this dummy, let's just pick a valid token (e.g., first one).\n    #     if not mask: # No more valid tokens or stop condition\n    #         break\n    #     \n    #     next_token_id = next(iter(mask)) # Pick first allowed token for demo\n    #     commit_result = constraint.commit_token(next_token_id)\n    #     output_tokens.extend(commit_result.added_tokens)\n    #     if commit_result.is_stop:\n    #         break\n\n    # print(\"Generated (conceptually):\", ''.join(token_id_to_string_func(t) for t in output_tokens))\n\n    print(\"For a practical example, see the `guidance` library, which builds on llguidance.\")\n    print(\"https://github.com/guidance-ai/guidance\")\n\nexcept NotImplementedError as e:\n    print(f\"Skipping full quickstart due to missing specific llguidance Python API for tokenizer integration: {e}\")\n    print(\"llguidance is primarily a backend library; direct low-level Python usage often requires deep integration with LLM internals.\")","lang":"python","description":"This quickstart outlines the conceptual flow for using `llguidance`'s Python bindings for constrained generation. Due to the low-level nature of `llguidance` and its primary role as a backend for the `guidance` library, directly running a standalone example requires a fully integrated LLM and its tokenizer. The provided code demonstrates the theoretical steps of initializing a tokenizer (using a placeholder), defining a grammar (JSON schema), and the token-by-token loop for `compute_mask` and `commit_token`. A complete runnable example would involve a real LLM and a more complex tokenizer integration, which is typically handled by higher-level libraries like `guidance`."},"warnings":[{"fix":"For most use cases, consider using the `guidance` Python library (pip install guidance), which provides a higher-level API that leverages `llguidance` for structured output. If direct integration is necessary, thoroughly understand your LLM's tokenizer and implement the token-by-token mask application correctly.","message":"Direct usage of `llguidance` Python bindings is low-level and requires manual integration with an LLM's tokenizer and inference loop. Unlike the `guidance` library, which orchestrates this, `llguidance` provides the core grammar engine and expects token IDs and masks.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"When defining new grammars, use the Lark-like syntax supported by `llguidance` for future compatibility. Consult the `llguidance` GitHub repository for the latest grammar syntax documentation.","message":"The internal (JSON-based) grammar format used by `llguidance` is slowly being deprecated in favor of a Lark-like format. While the internal format is still supported, new grammars should prefer the Lark-like syntax.","severity":"deprecated","affected_versions":">=1.0.0"},{"fix":"Design your LLM integration to offload `llguidance.Constraint.compute_mask()` calls to a separate thread or asynchronous task to minimize latency and maximize throughput, especially when the LLM's forward pass is running on a GPU.","message":"Performance of `compute_mask()` can vary, especially with large tokenizers or complex grammars. While optimized, it may take over 1ms in some cases. It's recommended to run mask computation in a background thread to avoid blocking the main inference loop, particularly when operating on GPUs.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Regularly check release notes for `llguidance` and any consuming libraries (like `guidance`) to understand potential impacts. Test your applications with new versions in a controlled environment.","message":"Updates to `llguidance` (e.g., 1.6.1) that introduce new features or performance improvements can indirectly affect users of `guidance` or other integrated libraries. While `llguidance` itself aims for stability, its underlying logic changes might alter subtle behaviors for consumers.","severity":"breaking","affected_versions":">=0.2.0 (for `guidance` users)"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}