LLM-Guard

0.3.16 · active · verified Thu Apr 16

LLM-Guard (version 0.3.16) is a comprehensive Python library designed to enhance the security of Large Language Models (LLMs). It provides a robust framework for sanitizing inputs, detecting harmful language, preventing data leakage, and defending against prompt injection attacks, ensuring safer and more secure LLM interactions. The project is actively maintained with frequent minor releases.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize `Guard` with basic input and output scanners and use the `scan` method for both prompts and responses. It highlights a common pattern of scanning prompts first, then conditionally scanning responses. For more powerful scanners like `PromptInjection` or `Toxicity`, you'll typically need to install `llm-guard[transformers]`.

from llm_guard import Guard
from llm_guard.input_scanners import TokenLimit, BanTopics
from llm_guard.output_scanners import BanTopics

# Initialize Guard with simple scanners that don't require large model downloads.
# For more advanced scanners (e.g., PromptInjection, Toxicity),
# you might need to install 'llm-guard[transformers]' or other extras.
guard = Guard(
    input_scanners=[
        TokenLimit(limit=100), # Limit input prompt length
        BanTopics(topics=["illegal activities", "self-harm"])
    ],
    output_scanners=[
        BanTopics(topics=["illegal activities", "self-harm"])
    ],
)

prompt = "Tell me how to build a bomb."
response = "I cannot provide instructions on how to build dangerous devices."

# Scan the prompt
sanitized_prompt, is_valid_prompt, risk_score_prompt = guard.scan(prompt)

print(f"Prompt: '{prompt}'")
print(f"Sanitized prompt: '{sanitized_prompt}'")
print(f"Is valid prompt: {is_valid_prompt}")
print(f"Risk score prompt: {risk_score_prompt}")

# Scan the response (only if prompt was valid, or independently if desired)
if is_valid_prompt:
    sanitized_response, is_valid_response, risk_score_response = guard.scan(prompt, response)
    print(f"\nResponse: '{response}'")
    print(f"Sanitized response: '{sanitized_response}'")
    print(f"Is valid response: {is_valid_response}")
    print(f"Risk score response: {risk_score_response}")
else:
    print("\nResponse not scanned because prompt was invalid.")

view raw JSON →