Gate News message, April 23 — DeepSeek has open-sourced TileKernels under the MIT license, a GPU kernel library written in TileLang for large language model training and inference. TileLang is a domain-specific language developed by the tile-ai team for expressing high-performance GPU kernels in Python. DeepSeek stated that most kernels in the library have approached hardware performance limits in compute density and memory bandwidth, with portions already deployed in internal training and inference operations.
The library comprises six categories of kernels: MoE (mixture of experts) gating and routing, including Top-k expert selection, token-to-expert mapping, and fused expand/shrink with weight normalization; quantization supporting FP8, FP4, and E5M6 formats with per-token, per-block, and per-channel quantization, including fused SwiGLU+quantization operations; batch transpose; Engram gating with fused RMSNorm forward/backward propagation and weight gradient reduction; Manifold HyperConnection with Sinkhorn normalization and mixed split/apply; and high-level autograd interfaces that wrap low-level kernels into trainable layers.
Engram and Manifold HyperConnection are proprietary components of DeepSeek's model architecture, with implementation details disclosed publicly for the first time. The library requires NVIDIA SM90 or SM100 architecture GPUs (H100/H200 or Blackwell series), CUDA Toolkit 13.1 or higher, and PyTorch 2.10 or higher.