OpenAI Guardrails

0.2.1 · active · verified Wed Apr 15

OpenAI Guardrails is a Python framework designed for building safe and reliable AI systems by adding configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of built-in guardrails like content safety, data protection (e.g., PII detection), and content quality (e.g., hallucination detection). The library is actively maintained by OpenAI, with frequent releases, and is currently at version 0.2.1.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to integrate `openai-guardrails` by replacing the standard OpenAI client with a `GuardrailsOpenAI` instance. It highlights the use of a `guardrails_config.json` file to define guardrail logic and shows how to handle `GuardrailTripwireTriggered` exceptions when a violation occurs. A basic `guardrails_config.json` is provided as a comment for immediate testing.

import os
from pathlib import Path
from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered
from openai import OpenAI

# Ensure your OpenAI API key is set as an environment variable (OPENAI_API_KEY)
# or passed directly to the client.
# For model-based guardrails, an API key is required.
# To run this example, create a simple 'guardrails_config.json' file in the same directory:
# {"version": "1", "input": {"version": "1", "guardrails": [{"name": "Moderation", "config": {}}]}}

def main():
    # Initialize OpenAI client (standard or Guardrails client)
    # The GuardrailsOpenAI client acts as a drop-in replacement
    # It requires a config file (e.g., guardrails_config.json) that defines the guardrails to apply.
    guardrails_client = GuardrailsOpenAI(config=Path("guardrails_config.json"))

    try:
        # Use the Guardrails client just like a regular OpenAI client
        response = guardrails_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": "Hello, how are you?"}
            ]
        )
        print("LLM Output:", response.choices[0].message.content)
        # You can also access guardrail results if available
        if hasattr(response, 'guardrail_results'):
            print("Guardrail Results:", response.guardrail_results)

        # Example of triggering a moderation guardrail (if configured to block)
        print("\nTesting with potentially problematic input...")
        problematic_response = guardrails_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": "I want to harm someone."}
            ]
        )
        print("LLM Output (problematic):", problematic_response.choices[0].message.content)

    except GuardrailTripwireTriggered as e:
        print(f"\nGuardrail triggered: {e.guardrail_result.info}")
        print(f"Violation details: {e.guardrail_result.details}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

if __name__ == "__main__":
    # Set a dummy API key if not already set, for local testing without network calls (if guardrails config allows).
    # For actual model-based guardrails, a valid API key is essential.
    if not os.environ.get("OPENAI_API_KEY"):
        os.environ["OPENAI_API_KEY"] = os.environ.get('TEST_OPENAI_API_KEY', 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx')
    main()

view raw JSON →