{"id":7979,"library":"bfcl-eval","title":"Berkeley Function Calling Leaderboard Evaluation","description":"bfcl-eval is the Python library for the Berkeley Function Calling Leaderboard (BFCL), a benchmark to evaluate Large Language Models (LLMs) on their ability to perform function calling. It provides the evaluation pipeline and datasets, including support for multi-step and multi-turn function calls as of its V3 release. The library is actively maintained with frequent updates, with its current PyPI version being 2026.3.23.","status":"active","version":"2026.3.23","language":"en","source_language":"en","source_url":"https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard","tags":["LLM evaluation","function calling","AI leaderboard","benchmark","tool use"],"install":[{"cmd":"pip install bfcl-eval","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required for evaluating models like GPT-4o, which are frequently used in the benchmark.","package":"openai","optional":false}],"imports":[{"note":"The primary evaluation logic resides within the `eval_pipeline` submodule.","wrong":"from bfcl_eval import eval_handler","symbol":"eval_handler","correct":"from bfcl_eval.eval_pipeline import eval_handler"},{"note":"Metrics and result processing classes are located in the `eval_pipeline` submodule.","wrong":"from bfcl_eval import EvalMetrics","symbol":"EvalMetrics","correct":"from bfcl_eval.eval_pipeline import EvalMetrics"}],"quickstart":{"code":"import argparse\nimport os\nfrom bfcl_eval.eval_pipeline import eval_handler\n\n# Set your OpenAI API key as an environment variable\n# For testing, you might use a placeholder, but for actual runs, it's required.\nos.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'YOUR_OPENAI_API_KEY_HERE')\n\n# Create a Namespace object to simulate command-line arguments\n# These are common arguments required by the `run_eval` method.\nargs = argparse.Namespace(\n    dataset_name='multi_step_9-8-0', # Example dataset, check docs for available ones\n    model_name='gpt-4o',          # Model to evaluate, e.g., 'gpt-4o', 'gemini-1.5-pro'\n    num_gpus=0,                   # Set to 0 for CPU execution\n    batch_size=1,\n    num_eval_prompts=1,           # Number of prompts to evaluate (for quick test)\n    output_dir='./bfcl_results',  # Directory to save results\n    api_key=os.environ['OPENAI_API_KEY'], # Passed via args or env var\n    temp=0.7,\n    top_p=1.0,\n    max_tokens=2000,\n    system_prompt_path=None,\n    eval_mode='full',\n    eval_version='v3',            # Refers to the benchmark version (V1, V2, V3)\n    enable_tool_code_execution=False, # Set to True to enable code execution (requires sandboxing)\n    enable_parallel=False,\n    num_threads=1,\n    live_data=False\n)\n\nprint(f\"Starting BFCL evaluation for dataset '{args.dataset_name}' with model '{args.model_name}'...\")\n\ntry:\n    # Run the evaluation pipeline\n    results = eval_handler.run_eval(args)\n    print(\"\\nEvaluation Complete!\")\n    print(\"Results:\")\n    print(results)\nexcept Exception as e:\n    print(f\"\\nAn error occurred during evaluation: {e}\")\n    if 'OPENAI_API_KEY' not in os.environ or not os.environ['OPENAI_API_KEY']:\n        print(\"Please ensure your OPENAI_API_KEY environment variable is set correctly.\")\n    print(\"Check the dataset name, model name, and API key configurations.\")","lang":"python","description":"This quickstart demonstrates how to programmatically run an evaluation using `bfcl-eval`. It simulates the command-line arguments needed by `eval_handler.run_eval` to specify the dataset, model, and other evaluation parameters. Note that for commercial models like 'gpt-4o', an API key (e.g., `OPENAI_API_KEY`) must be set as an environment variable."},"warnings":[{"fix":"Ensure the necessary API key is set in your environment variables or passed to the evaluation handler. Refer to the documentation for the specific model you intend to evaluate.","message":"Many models (e.g., GPT-4o, Gemini) require an API key to be set either as an environment variable (e.g., `OPENAI_API_KEY`) or passed directly via `args.api_key`. Forgetting this is a common source of errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always consult the latest documentation in the `gorilla` GitHub repository's `berkeley-function-call-leaderboard` README for the most up-to-date list of supported `dataset_name` values and `eval_version` flags for your installed `bfcl-eval` version.","message":"The Berkeley Function Calling Leaderboard has evolved through multiple versions (V1, V2, V3), which often involve changes to dataset names, formats, and evaluation methodologies. Using an older `dataset_name` with a newer evaluation pipeline, or vice-versa, can lead to `ArgumentError` or incorrect results.","severity":"breaking","affected_versions":"v1.0 - Current (PyPI 2026.3.23)"},{"fix":"When using `pip install bfcl-eval`, ensure imports follow the package structure (e.g., `from bfcl_eval.eval_pipeline import ...`). If you intend to use the repository's scripts directly, follow its specific setup instructions.","message":"The `bfcl-eval` package is a component of the larger 'Gorilla' project. Users sometimes confuse installing the `bfcl-eval` PyPI package with directly cloning and running scripts from the `Gorilla` GitHub repository's `berkeley-function-call-leaderboard` subdirectory. This can lead to `ModuleNotFoundError` if imports are based on the repository structure instead of the installed package structure.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Set the `OPENAI_API_KEY` environment variable with a valid key. Verify your OpenAI account has permissions for the model you are trying to use.","cause":"The OpenAI API key is missing, invalid, or your account does not have access to the specified model (e.g., insufficient tier, region restrictions).","error":"openai.BadRequestError: The model `gpt-4o` does not exist or you do not have access to it."},{"fix":"Install the package using `pip install bfcl-eval`. If using a virtual environment, ensure it is activated.","cause":"The `bfcl-eval` package is not installed in the current Python environment, or the environment is not active.","error":"ModuleNotFoundError: No module named 'bfcl_eval'"},{"fix":"Consult the `bfcl-eval` documentation or the `gorilla` GitHub repository's `README` for a list of currently supported `dataset_name` values for your installed `bfcl-eval` version and the `eval_version` you are targeting.","cause":"The specified dataset name is either incorrect, deprecated, or not available in the installed version of `bfcl-eval` or for the chosen `eval_version`.","error":"argparse.ArgumentError: argument --dataset_name: invalid choice: 'old_dataset_name' (choose from 'multi_step_9-8-0', 'multi_turn_base_34', ...)"}]}