Introducing FrontierMath, a challenging benchmark composed of hundreds of expert-level mathematics problems aimed at evaluating AI's advanced reasoning capabilities. The benchmark spans numerous branches of mathematics and requires extended hours of reasoning, testing genuine understanding without the possibility of guessing.