Outlines-core

0.2.14 · active · verified Thu Apr 09

Outlines-core is a Python library that provides high-performance, structured text generation capabilities, implemented in Rust. It offers core functionality for building regular expressions from JSON schemas and constructing finite-state automata to efficiently guide large language model (LLM) token generation. The current version is 0.2.14, with a release cadence that appears active, aligning with the broader 'outlines' project.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `outlines-core` to build a regular expression from a JSON schema, create a vocabulary from a pretrained model, construct an index, and initialize a `Guide` for structured generation. This `Guide` can then be used to constrain the token generation of an LLM.

import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.guide import Guide, Index, Vocabulary

# Define a JSON schema
schema = {
    "title": "Foo",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

# Generate a regular expression from the schema
regex = build_regex_from_schema(json.dumps(schema))

# Create Vocabulary from a pretrained model (e.g., GPT-2 for demonstration)
vocabulary = Vocabulary.from_pretrained("openai-community/gpt2")

# Create an Index from the regex and vocabulary
index = Index(regex, vocabulary)

# Create a Guide instance
guide = Guide(index)

# Example interaction with the guide (simplified for quickstart)
# In a real scenario, you'd integrate this with an LLM's token generation
current_state = guide.get_state()
allowed_tokens = guide.get_tokens()
print(f"Initial allowed token IDs: {allowed_tokens}")

# Simulate advancing the guide with a token (e.g., the first allowed token)
if allowed_tokens:
    first_token_id = allowed_tokens[0]
    # In a real LLM integration, you'd feed this token to the model
    # and then get the next state based on the model's output.
    next_allowed_tokens = guide.advance(first_token_id)
    print(f"Next allowed token IDs after advancing with {first_token_id}: {next_allowed_tokens}")

print(f"Is guide finished? {guide.is_finished()}")

view raw JSON →