{"id":4103,"library":"math-verify","title":"HuggingFace Math-Verify","description":"Math-Verify is a robust Python library from HuggingFace, currently at version 0.9.0, designed for evaluating Large Language Model outputs in mathematical tasks. It provides sophisticated capabilities for parsing and verifying mathematical expressions, including LaTeX and plain numerical formats. The library supports complex features like set theory, equation/inequality comparison, and advanced normalization, aiming to offer higher accuracy in assessing LLM performance on math problems by moving beyond strict format requirements and inflexible comparison logic. It maintains an active development and release cadence.","status":"active","version":"0.9.0","language":"en","source_language":"en","source_url":"https://github.com/huggingface/math-verify","tags":["math","verification","evaluation","huggingface","llm","latex","sympy"],"install":[{"cmd":"pip install math-verify","lang":"bash","label":"Basic Install"},{"cmd":"pip install math-verify[antlr4_13_2]","lang":"bash","label":"Install with Specific Antlr4 Runtime"}],"dependencies":[{"reason":"Requires Python 3.10 or newer.","package":"python","optional":false},{"reason":"Core dependency for parsing; multiple versions (4.13.2, 4.11.0, 4.9.3) are supported via extras, and specifying one is recommended.","package":"antlr4-python3-runtime","optional":false},{"reason":"Used for LaTeX parsing and conversion to SymPy expressions.","package":"latex2sympy2_extended","optional":false},{"reason":"Underlying symbolic mathematics library for expression comparison.","package":"sympy","optional":false}],"imports":[{"symbol":"parse","correct":"from math_verify import parse"},{"symbol":"verify","correct":"from math_verify import verify"},{"symbol":"LatexExtractionConfig","correct":"from math_verify import LatexExtractionConfig"},{"symbol":"ExprExtractionConfig","correct":"from math_verify import ExprExtractionConfig"},{"symbol":"StringExtractionConfig","correct":"from math_verify import StringExtractionConfig"}],"quickstart":{"code":"from math_verify import parse, verify, LatexExtractionConfig, ExprExtractionConfig\n\n# Define extraction configurations\nextraction_configs = [LatexExtractionConfig(), ExprExtractionConfig()]\n\n# Parse the gold standard answer (e.g., from a dataset)\ngold_answer_text = \"${1,3} \\cup {2,4}$\"\ngold_parsed = parse(gold_answer_text, extraction_config=extraction_configs)\n\n# Parse the LLM generated answer\nllm_answer_text = \"${1,2,3,4}$\"\nllm_parsed = parse(llm_answer_text, extraction_config=extraction_configs)\n\n# Verify if the LLM's answer is mathematically equivalent to the gold standard\nis_correct = verify(gold_parsed, llm_parsed)\n\nprint(f\"Gold: {gold_answer_text} -> {gold_parsed}\")\nprint(f\"LLM: {llm_answer_text} -> {llm_parsed}\")\nprint(f\"Are answers equivalent? {is_correct}\")\n\n# Another example with an inequality and asymmetric comparison behavior\ngold_ineq = parse(\"1 < x < 2\")\nllm_interval = parse(\"(1,2)\")\nprint(f\"\\nGold (inequality): {gold_ineq}\")\nprint(f\"LLM (interval): {llm_interval}\")\nprint(f\"Are they equivalent (default)? {verify(gold_ineq, llm_interval)}\")\n\n# To allow symmetric comparison (e.g., if gold is interval and pred is inequality)\ngold_interval = parse(\"(1,2)\")\nllm_ineq = parse(\"1 < x < 2\")\nprint(f\"\\nGold (interval): {gold_interval}\")\nprint(f\"LLM (inequality): {llm_ineq}\")\nprint(f\"Are they equivalent (default)? {verify(gold_interval, llm_ineq)}\")\nprint(f\"Are they equivalent (allow_set_relation_comp=True)? {verify(gold_interval, llm_ineq, allow_set_relation_comp=True)}\")","lang":"python","description":"This quickstart demonstrates how to use `parse` to extract mathematical expressions from strings (both LaTeX and plain expressions) and `verify` to check for mathematical equivalence. It highlights the use of `ExtractionConfig` classes and illustrates the default asymmetric behavior for comparing intervals and inequalities."},"warnings":[{"fix":"Review any code that explicitly creates or manipulates `sympy.FiniteSet` objects intended for use with `math-verify`. Consider adapting to `latex2sympy2_extended.sets.FiniteSet` or ensuring compatibility through conversion if necessary.","message":"As of version 0.5.0, `math-verify` replaced the direct use of `sympy.FiniteSet` with `FiniteSet` from `latex2sympy2_extended.sets`. If your code directly interacted with `sympy.FiniteSet` objects in conjunction with `math-verify`'s internal set handling, this change might break compatibility or lead to unexpected behavior.","severity":"breaking","affected_versions":">=0.5.0"},{"fix":"Remove the `equations` parameter from your `NormalizationConfig` instances. The parser will automatically manage equation handling.","message":"The `equations` parameter in `NormalizationConfig` was deprecated in version 0.6.0. Its functionality is now handled internally by the parser.","severity":"deprecated","affected_versions":">=0.6.0"},{"fix":"If symmetric comparison is desired, pass `allow_set_relation_comp=True` to the `verify` function. For example: `verify(gold, answer, allow_set_relation_comp=True)`.","message":"The `verify` function has an intentional asymmetric behavior when comparing interval-like expressions (e.g., `(1,2)`) and inequality-like expressions (e.g., `1 < x < 2`). By default, `verify` might return `True` for `1 < x < 2` (gold) vs. `(1,2)` (prediction), but `False` for `(1,2)` (gold) vs. `1 < x < 2` (prediction). This design prevents models from simply returning the input inequality without solving it.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Monitor parsing times for very long or complex inputs. Adjust the overall `parsing_timeout` parameter in your `ExtractionConfig` or consider pre-processing excessively long inputs if timeouts become frequent.","message":"As of version 0.6.2, the parsing timeout mechanism was changed from being per-extraction to global. This means that a single long input with multiple embedded expressions might exhaust the global timeout, even if individual extractions would have completed within their own (now defunct) per-extraction limits.","severity":"gotcha","affected_versions":">=0.6.2"},{"fix":"To ensure internal errors are propagated as exceptions, set `raise_on_error=True` when calling `parse` or `verify`. Alternatively, configure your Python logging to display messages at `DEBUG` level for the `math_verify` module.","message":"In version 0.8.0, the default logging verbosity was reduced, and internal errors are now logged at the debug level by default. This means you might not see parsing or verification errors in standard log outputs unless `raise_on_error` is set to `True` or logging is configured to show debug messages.","severity":"gotcha","affected_versions":">=0.8.0"},{"fix":"Always use the `parse` function to process both gold and prediction answers before passing their outputs to `verify`. Avoid manually creating mixed lists of SymPy objects and strings for direct verification.","message":"The `verify` function's behavior for lists containing a mix of SymPy expressions and strings is optimized for inputs originating from the `parse` function. Directly constructing lists (e.g., `[sympy.Number(0), '0']`) and passing them to `verify` might lead to unexpected `False` results, especially when a SymPy expression on one side should logically match a string on the other.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}