Outlines-core
Outlines-core is a Python library that provides high-performance, structured text generation capabilities, implemented in Rust. It offers core functionality for building regular expressions from JSON schemas and constructing finite-state automata to efficiently guide large language model (LLM) token generation. The current version is 0.2.14, with a release cadence that appears active, aligning with the broader 'outlines' project.
Warnings
- breaking Version 0.2.0 introduced significant interface changes, especially regarding import paths and the API for `Guide`, `Index`, and `Vocabulary`. Code written for `outlines-core` versions prior to 0.2 will likely break.
- gotcha `outlines-core` requires a Rust compiler to be available on your system during installation if pre-built wheels are not available for your specific Python version and operating system/architecture. This often leads to installation failures in environments like Google Colab or certain ARM-based systems.
- gotcha When creating a `Vocabulary` manually from tokenizer data, it's important to convert tokens to their string representations. This is necessary to correctly handle special tokens that might not be recognized by the underlying finite-state automaton (DFA).
Install
-
pip install outlines-core
Imports
- build_regex_from_schema
from outlines_core.json_schema import build_regex_from_schema
- Guide
from outlines_core.guide import Guide
- Index
from outlines_core.guide import Index
- Vocabulary
from outlines_core.guide import Vocabulary
Quickstart
import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.guide import Guide, Index, Vocabulary
# Define a JSON schema
schema = {
"title": "Foo",
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
# Generate a regular expression from the schema
regex = build_regex_from_schema(json.dumps(schema))
# Create Vocabulary from a pretrained model (e.g., GPT-2 for demonstration)
vocabulary = Vocabulary.from_pretrained("openai-community/gpt2")
# Create an Index from the regex and vocabulary
index = Index(regex, vocabulary)
# Create a Guide instance
guide = Guide(index)
# Example interaction with the guide (simplified for quickstart)
# In a real scenario, you'd integrate this with an LLM's token generation
current_state = guide.get_state()
allowed_tokens = guide.get_tokens()
print(f"Initial allowed token IDs: {allowed_tokens}")
# Simulate advancing the guide with a token (e.g., the first allowed token)
if allowed_tokens:
first_token_id = allowed_tokens[0]
# In a real LLM integration, you'd feed this token to the model
# and then get the next state based on the model's output.
next_allowed_tokens = guide.advance(first_token_id)
print(f"Next allowed token IDs after advancing with {first_token_id}: {next_allowed_tokens}")
print(f"Is guide finished? {guide.is_finished()}")