Outlines-core
Outlines-core is a Python library that provides high-performance, structured text generation capabilities, implemented in Rust. It offers core functionality for building regular expressions from JSON schemas and constructing finite-state automata to efficiently guide large language model (LLM) token generation. The current version is 0.2.14, with a release cadence that appears active, aligning with the broader 'outlines' project.
Common errors
-
error: can't find Rust compiler
cause The `outlines-core` library is implemented in Rust and requires a Rust compiler (and Cargo) to be installed on your system to build its native components if a pre-compiled wheel for your specific Python version and operating system is not available.fixInstall Rust and Cargo. The recommended way is to use `rustup`: `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` (then follow the on-screen instructions), or install via your system's package manager. -
ModuleNotFoundError: No module named 'outlines.fsm'
cause This error typically occurs when there's a version mismatch or breaking API change in the `outlines` library, where modules like `outlines.fsm`, `outlines.caching`, or `outlines._version` have been moved, renamed, or removed in newer versions due to refactoring, especially with the introduction of `outlines-core`.fixEnsure your `outlines` library is up-to-date by running `pip install --upgrade outlines`. If you are using `outlines-core` directly, verify that your code uses the current public API of `outlines-core` or a compatible version of `outlines` that integrates it. Sometimes, if another library has a strict dependency, you might need to install a specific older version, e.g., `pip install outlines==0.0.44`. -
AttributeError: partially initialized module 'json' has no attribute 'loads'
cause This error arises when a local file or module within your project or environment is named `json.py` (or similar), shadowing Python's standard `json` library, causing import conflicts when `outlines` or its dependencies try to use the built-in `json` module.fixRename any custom files or modules named `json.py` (or other standard library names) in your project to avoid conflicts with Python's built-in modules. For example, rename `json.py` to `my_json_utils.py`.
Warnings
- breaking Version 0.2.0 introduced significant interface changes, especially regarding import paths and the API for `Guide`, `Index`, and `Vocabulary`. Code written for `outlines-core` versions prior to 0.2 will likely break.
- gotcha `outlines-core` requires a Rust compiler to be available on your system during installation if pre-built wheels are not available for your specific Python version and operating system/architecture. This often leads to installation failures in environments like Google Colab or certain ARM-based systems.
- gotcha When creating a `Vocabulary` manually from tokenizer data, it's important to convert tokens to their string representations. This is necessary to correctly handle special tokens that might not be recognized by the underlying finite-state automaton (DFA).
Install
-
pip install outlines-core
Imports
- build_regex_from_schema
from outlines_core.json_schema import build_regex_from_schema
- Guide
from outlines_core.fsm.guide import Guide
from outlines_core.guide import Guide
- Index
from outlines_core.fsm.guide import Index
from outlines_core.guide import Index
- Vocabulary
from outlines_core.fsm.guide import Vocabulary
from outlines_core.guide import Vocabulary
Quickstart
import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.guide import Guide, Index, Vocabulary
# Define a JSON schema
schema = {
"title": "Foo",
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
# Generate a regular expression from the schema
regex = build_regex_from_schema(json.dumps(schema))
# Create Vocabulary from a pretrained model (e.g., GPT-2 for demonstration)
vocabulary = Vocabulary.from_pretrained("openai-community/gpt2")
# Create an Index from the regex and vocabulary
index = Index(regex, vocabulary)
# Create a Guide instance
guide = Guide(index)
# Example interaction with the guide (simplified for quickstart)
# In a real scenario, you'd integrate this with an LLM's token generation
current_state = guide.get_state()
allowed_tokens = guide.get_tokens()
print(f"Initial allowed token IDs: {allowed_tokens}")
# Simulate advancing the guide with a token (e.g., the first allowed token)
if allowed_tokens:
first_token_id = allowed_tokens[0]
# In a real LLM integration, you'd feed this token to the model
# and then get the next state based on the model's output.
next_allowed_tokens = guide.advance(first_token_id)
print(f"Next allowed token IDs after advancing with {first_token_id}: {next_allowed_tokens}")
print(f"Is guide finished? {guide.is_finished()}")