🧀 BigCheese.ai

Social

DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels

🧀

DeepGEMM is a library for efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling on NVIDIA Hopper architecture, featuring a clean design with a core kernel of around 300 lines of CUDA code. It outperforms expert-tuned libraries across various matrix shapes without heavy reliance on templates or algebras.

  • Based on CUDA for NVIDIA Hopper
  • JIT compilation at runtime
  • Supports MoE grouped GEMMs
  • MIT licensed project
  • Performance advantage over CUTLASS