LangExtract Library

1.2.1 · active · verified Thu Apr 16

LangExtract is a Python library for robustly extracting structured data from large language models (LLMs). It handles automatic chunking, multi-language support, and provides integrations with various LLM providers (e.g., OpenAI, Vertex AI, Ollama). The current version is 1.2.1, with a frequent release cadence, often introducing new providers, features, and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart defines a schema using `langextract_schema` and then uses the `extract` function to parse structured data from a given text, demonstrating a basic usage pattern. Ensure your LLM provider's API key is available (e.g., `OPENAI_API_KEY`).

import os
from langextract import extract, dataclasses

# Define a simple schema for extraction
@dataclasses.langextract_schema
class Person(dataclasses.LangExtractSchema):
    name: str
    age: int
    occupation: str

text = "John Doe is a software engineer aged 30."

# Use a dummy OpenAI key for quickstart, actual key from env
os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'sk-dummy-key')

# Extract data using the schema
result = extract(
    text=text,
    schema=Person,
    model_name="gpt-3.5-turbo" # Or any other supported model
)

print(result.extracted_data)
# Expected: Person(name='John Doe', age=30, occupation='software engineer')

view raw JSON →