FlashAttention-3 brings new optimizations on Hopper GPUs, achieving 1.5-2.0x the speed of its predecessor with FP16 and nearly 1.2 PFLOPS with FP8, all while reducing quantization error. These improvements enhance efficiency and enable longer context AI models.