DeepSeek Open-Sources TileKernels, GPU Kernel Library for Large Model Training and Inference

Gate News message, April 23 — DeepSeek has open-sourced TileKernels under the MIT license, a GPU kernel library written in TileLang for large language model training and inference. TileLang is a domain-specific language developed by the tile-ai team for expressing high-performance GPU kernels in Python. DeepSeek stated that most kernels in the library have approached hardware performance limits in compute density and memory bandwidth, with portions already deployed in internal training and inference operations.

The library comprises six categories of kernels: MoE (mixture of experts) gating and routing, including Top-k expert selection, token-to-expert mapping, and fused expand/shrink with weight normalization; quantization supporting FP8, FP4, and E5M6 formats with per-token, per-block, and per-channel quantization, including fused SwiGLU+quantization operations; batch transpose; Engram gating with fused RMSNorm forward/backward propagation and weight gradient reduction; Manifold HyperConnection with Sinkhorn normalization and mixed split/apply; and high-level autograd interfaces that wrap low-level kernels into trainable layers.

Engram and Manifold HyperConnection are proprietary components of DeepSeek's model architecture, with implementation details disclosed publicly for the first time. The library requires NVIDIA SM90 or SM100 architecture GPUs (H100/H200 or Blackwell series), CUDA Toolkit 13.1 or higher, and PyTorch 2.10 or higher.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments