FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI

🧀

View Website Twitter LinkedIn Technical Report Sample Problems

Introducing FrontierMath, a challenging benchmark composed of hundreds of expert-level mathematics problems aimed at evaluating AI's advanced reasoning capabilities. The benchmark spans numerous branches of mathematics and requires extended hours of reasoning, testing genuine understanding without the possibility of guessing.

Published on Nov 08, 2024
Problems take days for experts
Less than 2% solved by AI
Contributions from Fields medalists
Continuous updates planned

View Website Twitter LinkedIn Technical Report Sample Problems

Social

FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI