Zhipu AI Launches GLM-5.1 High-Speed API at 400 Tokens per Second

Zhipu AI launched the GLM-5.1 High-Speed API for select enterprise customers, with output speed reaching 400 tokens per second. Zhipu AI said the service uses a jointly developed inference engine with TileRT and will continue adding FP8 inference and ultra-long context support.
ZHIPU26.55%
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned