🧀 BigCheese.ai

Social

Kagi LLM Benchmarking Project

🧀

Kagi has introduced the LLM Benchmarking Project to evaluate major large language models on their reasoning, coding, and instruction-following abilities. The benchmarks apply novel and frequently changing challenges to prevent overfitting. Results show varying accuracy, cost, latency, and token speed among models. OpenAI's gpt-4o leads the rankings with an accuracy of 52%, showcasing the evolving landscape of LLM performance for applications like Kagi Search's reasoning and instruction-following features.

  • Updated July 29, 2024.
  • OpenAI gpt-4o has 52% accuracy.
  • Tests assess reasoning and coding.
  • Benchmarks designed to be challenging.
  • Kagi provides model access via subscription.