DeepGEMM is a library for efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling on NVIDIA Hopper architecture, featuring a clean design with a core kernel of around 300 lines of CUDA code. It outperforms expert-tuned libraries across various matrix shapes without heavy reliance on templates or algebras.