🧀 BigCheese.ai

Social

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

🧀

FlashAttention-3 brings new optimizations on Hopper GPUs, achieving 1.5-2.0x the speed of its predecessor with FP16 and nearly 1.2 PFLOPS with FP8, all while reducing quantization error. These improvements enhance efficiency and enable longer context AI models.

  • FlashAttention-3 is 1.5-2x faster than FlashAttention-2.
  • Achieves up to 740 TFLOPS with FP16.
  • Reaches close to 1.2 PFLOPS with FP8.
  • Optimizes for modern Hopper GPUs.
  • Reduces quantization error by 2.6x.