{"id":3719,"library":"openevals","title":"OpenEvals","description":"OpenEvals is an open-source Python library providing ready-made evaluators for Large Language Model (LLM) applications. It offers a structured approach to LLM evaluation, similar to traditional software testing, with built-in functionalities like LLM-as-judge evaluators and prebuilt prompts for common evaluation scenarios such as correctness, conciseness, and hallucination detection. Developed by LangChain, it aims to streamline the process of bringing LLM applications to production by making evaluation more accessible and transparent. The current version is 0.2.0, with ongoing development and updates.","status":"active","version":"0.2.0","language":"en","source_language":"en","source_url":"https://github.com/langchain-ai/openevals","tags":["LLM","evaluation","AI","LangChain","testing","prompt-engineering"],"install":[{"cmd":"pip install openevals","lang":"bash","label":"Install via pip"}],"dependencies":[{"reason":"Required for LLM-as-judge evaluators that use OpenAI models.","package":"openai","optional":false}],"imports":[{"symbol":"create_llm_as_judge","correct":"from openevals.llm import create_llm_as_judge"},{"symbol":"CORRECTNESS_PROMPT","correct":"from openevals.prompts import CORRECTNESS_PROMPT"},{"symbol":"CONCISENESS_PROMPT","correct":"from openevals.prompts import CONCISENESS_PROMPT"},{"symbol":"HALLUCINATION_PROMPT","correct":"from openevals.prompts import HALLUCINATION_PROMPT"}],"quickstart":{"code":"import os\nfrom openevals.llm import create_llm_as_judge\nfrom openevals.prompts import CORRECTNESS_PROMPT\n\n# Ensure your OpenAI API key is set as an environment variable\n# For example: os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n# For quickstart, we use .get to avoid immediate error if not set, but it's required for actual use.\nif not os.environ.get(\"OPENAI_API_KEY\"): print(\"WARNING: OPENAI_API_KEY not set. Quickstart will fail without it.\")\n\n# Create a correctness evaluator using an LLM-as-judge\ncorrectness_evaluator = create_llm_as_judge(\n    prompt=CORRECTNESS_PROMPT,\n    model=\"openai:o3-mini\", # 'o3-mini' refers to gpt-3.5-turbo-0125\n)\n\n# Define inputs, outputs, and reference outputs for evaluation\ninputs = \"How much has the price of doodads changed in the past year?\"\noutputs = \"Doodads have increased in price by 10% in the past year.\"\nreference_outputs = \"The price of doodads has decreased by 50% in the past year.\"\n\n# Run the evaluator\neval_result = correctness_evaluator(\n    inputs=inputs,\n    outputs=outputs,\n    reference_outputs=reference_outputs\n)\n\nprint(eval_result)\n# Expected output (score might vary slightly based on LLM, but structure is consistent):\n# { 'key': 'score', 'score': False, 'comment': 'The provided answer stated that doodads increased in price by 10%, which conflicts with the reference output...' }","lang":"python","description":"This quickstart demonstrates how to set up and run a basic LLM-as-judge correctness evaluation. It uses a prebuilt prompt and an OpenAI model. Ensure your `OPENAI_API_KEY` environment variable is set for the example to run successfully. The evaluator returns a dictionary containing a score and a comment based on the LLM's judgment."},"warnings":[{"fix":"Refer to the `openevals` documentation or LangChain's model integration guides for correct model string formats for your chosen LLM provider.","message":"The `model` parameter in `create_llm_as_judge` expects specific string formats (e.g., `\"openai:o3-mini\"`). This implies integration with LangChain's model abstraction and might differ from direct LLM client instantiation methods.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be explicit about the expected return structure when using custom `output_schema` and update any downstream code that processes evaluation results accordingly.","message":"Providing a custom `output_schema` to `create_llm_as_judge` will alter the return value of the evaluator. By default, it returns a simple dictionary with a boolean `score` and a `comment`. A custom schema will override this structure.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set the necessary API key as an environment variable (e.g., `export OPENAI_API_KEY=\"your_key_here\"` in your shell) before running evaluators that rely on external LLMs.","message":"Many of the core evaluators, especially LLM-as-judge evaluators, require an API key for an external LLM provider (e.g., OpenAI, Anthropic). This key must be configured in your environment, typically via an environment variable like `OPENAI_API_KEY`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}