LangExtract Library
LangExtract is a Python library for robustly extracting structured data from large language models (LLMs). It handles automatic chunking, multi-language support, and provides integrations with various LLM providers (e.g., OpenAI, Vertex AI, Ollama). The current version is 1.2.1, with a frequent release cadence, often introducing new providers, features, and bug fixes.
Common errors
-
unexpected keyword argument 'reasoning'
cause Incorrect parameter passing for `reasoning_effort` with specific OpenAI models (o1, o3, o4-mini, gpt-5) in `langextract` versions prior to 1.2.1.fixUpgrade `langextract` to version 1.2.1 or newer (`pip install --upgrade langextract`). -
InferenceConfigError: Could not resolve model provider for 'ollama'
cause Built-in model providers were not correctly loaded when specified by name in `ModelConfig` for `langextract` versions prior to 1.2.0.fixUpgrade `langextract` to version 1.2.0 or newer (`pip install --upgrade langextract`). -
ValueError: Required parameter: project
cause Missing or incorrect `project` parameter handling for Vertex AI Gemini Batch API calls in `langextract` versions prior to 1.1.1.fixUpgrade `langextract` to version 1.1.1 or newer (`pip install --upgrade langextract`). -
TypeError: OllamaLanguageModel.__init__() got an unexpected keyword argument 'model'
cause The `OllamaLanguageModel` parameter for specifying the model ID changed from `model` to `model_id` in `langextract` v1.0.4.fixUpdate your `ModelConfig` for Ollama to use `model_id` instead of `model`, e.g., `ModelConfig(provider='ollama', model_id='llama2')`.
Warnings
- breaking Prior to v1.0.4, `OllamaLanguageModel` used the `model` parameter. This was changed to `model_id` in v1.0.4 for consistency.
- gotcha Using `reasoning_effort` with certain OpenAI models (e.g., o1, o3, o4-mini, gpt-5) in versions <1.2.1 would cause an `unexpected keyword argument 'reasoning'` error due to incorrect parameter passing.
- gotcha When specifying an LLM provider by name (e.g., `ModelConfig(provider='ollama')`), versions <1.2.0 could fail to load built-in providers correctly, leading to an `InferenceConfigError`.
- gotcha For users of Gemini Batch API with Vertex AI, versions <1.1.1 might have encountered a 'Required parameter: project' error due to incorrect parameter handling.
- gotcha The `debug` parameter in `extract()` defaulted to `True` in versions prior to v1.0.9, resulting in verbose output. It now defaults to `False`.
Install
-
pip install langextract
Imports
- extract
from langextract import extract
- ModelConfig
from langextract import ModelConfig
- ProviderConfig
from langextract import ProviderConfig
- langextract_schema
from langextract.dataclasses import langextract_schema
- LangExtractSchema
from langextract.dataclasses import LangExtractSchema
Quickstart
import os
from langextract import extract, dataclasses
# Define a simple schema for extraction
@dataclasses.langextract_schema
class Person(dataclasses.LangExtractSchema):
name: str
age: int
occupation: str
text = "John Doe is a software engineer aged 30."
# Use a dummy OpenAI key for quickstart, actual key from env
os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'sk-dummy-key')
# Extract data using the schema
result = extract(
text=text,
schema=Person,
model_name="gpt-3.5-turbo" # Or any other supported model
)
print(result.extracted_data)
# Expected: Person(name='John Doe', age=30, occupation='software engineer')