DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels

🧀

View Website GitHub Repository MIT License DeepSeek-V3 CUTLASS Library

DeepGEMM is a library for efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling on NVIDIA Hopper architecture, featuring a clean design with a core kernel of around 300 lines of CUDA code. It outperforms expert-tuned libraries across various matrix shapes without heavy reliance on templates or algebras.

Based on CUDA for NVIDIA Hopper
JIT compilation at runtime
Supports MoE grouped GEMMs
MIT licensed project
Performance advantage over CUTLASS

View Website GitHub Repository MIT License DeepSeek-V3 CUTLASS Library

Social

DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels