Mathematics Dataset

raw JSON →
1.0.1 verified Fri May 01 auth: no python

A synthetic dataset of school-level mathematics questions from DeepMind, covering arithmetic, algebra, calculus, and more. Current version 1.0.1, released in 2019. No active development; stable.

pip install mathematics-dataset
error ModuleNotFoundError: No module named 'mathematics_dataset'
cause Package not installed or installed in a different environment.
fix
Run: pip install mathematics-dataset
error ValueError: Unrecognised split: train-easy (valid: train-easy, train-medium, train-hard, test-easy, test-medium, test-hard, interpolate)
cause Typo in split name or missing hyphen.
fix
Use one of the exact splits: 'train-easy', 'train-medium', 'train-hard', 'test-easy', 'test-medium', 'test-hard', 'interpolate'
gotcha The dataset only provides synthetic questions; answers may contain errors or be nonsensical for edge cases.
fix Always validate a sample of answers before using the dataset for training or evaluation.
gotcha The first time you load a dataset, it downloads and caches a ~1.4GB tar file. Ensure sufficient disk space and a stable internet connection.
fix Pre-download with: python -c 'from mathematics_dataset import MathematicsDataset; MathematicsDataset("train-easy")'
deprecated The library uses Python 2-style print and has not been updated for Python 3.8+ edge cases. Some imports may break with newer Python versions.
fix Use a virtual environment with Python 3.7 or 3.6; check issue #12 on GitHub.

Loads the training set of easy questions and prints the first 5 question-answer pairs.

from mathematics_dataset import MathematicsDataset
dataset = MathematicsDataset('train-easy', verbose=False)
for i, (question, answer) in enumerate(dataset):
    if i >= 5:
        break
    print(f'Q: {question}\nA: {answer}\n')