🧀 BigCheese.ai

Social

OpenAI o1 Results on ARC-AGI-Pub

🧀

OpenAI's new o1 models demonstrate incremental progress towards AGI, but new ideas are still needed. Over the past 24 hours, o1-preview and o1-mini, with improved chain-of-thought reasoning, were tested on ARC Prize and showed promising results, but their performance does not signal the arrival of AGI. The models exhibit a log-linear relationship between accuracy and test-time compute, prompting the pursuit of more efficient refinement and search methods. The ARC Prize calls on the community to contribute innovative approaches.

  • o1-preview scored 21.2% accuracy on ARC-AGI.
  • o1-preview is on par with Claude 3.5 Sonnet.
  • o1-mini scored 12.8% accuracy on ARC-AGI.
  • CoT reasoning shows a log-linear relationship with compute.
  • New ideas beyond fitting curves to data distributions are needed for AGI.