OpenAI Guardrails
OpenAI Guardrails is a Python framework designed for building safe and reliable AI systems by adding configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of built-in guardrails like content safety, data protection (e.g., PII detection), and content quality (e.g., hallucination detection). The library is actively maintained by OpenAI, with frequent releases, and is currently at version 0.2.1.
Warnings
- breaking In `v0.2.0`, the library changed to make the OpenAI response object directly accessible. This could affect how you access attributes (e.g., `response.output_text` or `response.choices[0].message.content`) if your code previously relied on wrapped access patterns.
- breaking In `v0.1.6`, the `Presidio anonymizer` dependency was removed due to conflicts. If your application relied on `openai-guardrails` for PII detection and masking via Presidio in versions prior to `v0.1.6`, this functionality might have changed or been removed, requiring alternative solutions or explicit dependency management.
- gotcha The core functionality of `openai-guardrails` relies on a `guardrails_config.json` file, which defines the specific guardrails (e.g., moderation, PII detection, jailbreak detection) and their configurations. This file is loaded at client initialization but is external to the Python code examples, requiring manual creation or use of the Guardrails Wizard.
- gotcha While the `openai-guardrails` library itself is open-source and free, many of its built-in guardrails (e.g., Hallucination Detection, Custom Prompt Check, Jailbreak) utilize OpenAI's own models and APIs. Consequently, these model-based checks will incur standard OpenAI API usage costs.
- gotcha When integrating with OpenAI Agents SDK, agent-level guardrails have specific execution boundaries. Input guardrails run only for the *first* agent in a multi-agent chain, and output guardrails run only for the agent that produces the *final* output. This implies that intermediate agent interactions or specific tool calls might require tool-level guardrails for comprehensive coverage.
Install
-
pip install openai-guardrails
Imports
- GuardrailsOpenAI
from guardrails import GuardrailsOpenAI
- GuardrailsAsyncOpenAI
from guardrails import GuardrailsAsyncOpenAI
- GuardrailTripwireTriggered
from guardrails import GuardrailTripwireTriggered
Quickstart
import os
from pathlib import Path
from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered
from openai import OpenAI
# Ensure your OpenAI API key is set as an environment variable (OPENAI_API_KEY)
# or passed directly to the client.
# For model-based guardrails, an API key is required.
# To run this example, create a simple 'guardrails_config.json' file in the same directory:
# {"version": "1", "input": {"version": "1", "guardrails": [{"name": "Moderation", "config": {}}]}}
def main():
# Initialize OpenAI client (standard or Guardrails client)
# The GuardrailsOpenAI client acts as a drop-in replacement
# It requires a config file (e.g., guardrails_config.json) that defines the guardrails to apply.
guardrails_client = GuardrailsOpenAI(config=Path("guardrails_config.json"))
try:
# Use the Guardrails client just like a regular OpenAI client
response = guardrails_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print("LLM Output:", response.choices[0].message.content)
# You can also access guardrail results if available
if hasattr(response, 'guardrail_results'):
print("Guardrail Results:", response.guardrail_results)
# Example of triggering a moderation guardrail (if configured to block)
print("\nTesting with potentially problematic input...")
problematic_response = guardrails_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "I want to harm someone."}
]
)
print("LLM Output (problematic):", problematic_response.choices[0].message.content)
except GuardrailTripwireTriggered as e:
print(f"\nGuardrail triggered: {e.guardrail_result.info}")
print(f"Violation details: {e.guardrail_result.details}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
if __name__ == "__main__":
# Set a dummy API key if not already set, for local testing without network calls (if guardrails config allows).
# For actual model-based guardrails, a valid API key is essential.
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = os.environ.get('TEST_OPENAI_API_KEY', 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx')
main()