🧀 BigCheese.ai

Social

Claude 3 beats GPT-4 on Aider's code editing benchmark

🧀

The newly released 'Claude 3 Opus' by Anthropic outperforms OpenAI's models on Aider's code editing benchmark, completing 68.4% of Python coding tasks successfully within two tries. Although marginally better than the 'GPT-4 Turbo', it is slower and more expensive. Claude 3 Sonnet is similar in capability to the 'GPT-3.5 Turbo'.

  • Claude 3 Opus has the highest benchmark score to date.
  • Uses diff editing for efficient code changes.
  • Performance of Claude 3 Sonnet is comparable to GPT-3.5.
  • Claude 3 has a larger context window than GPT-4.
  • Some code tasks were blocked by Claude 3.