Coinbase CEO Brian Armstrong said on June 26 that Coinbase has set Zhipu AI's latest GLM 5.2 and Beijing Moonshot AI's Kimi 2.7 as the default large language models for the company's internal engineers; Coinbase's AI spending has nearly halved, while token usage continues to grow exponentially.
Armstrong explained that GLM 5.2 and Kimi 2.7 are primarily deployed in routine task scenarios, such as standard code assistance and general engineering workflows; for tasks requiring complex planning, engineers can still opt for frontier models. In code review, Coinbase adopts a multi-model parallel strategy, allowing different models to cross-validate outputs to maintain quality standards.
Armstrong attributed the near-halving of Coinbase's AI spending to the following three-layer infrastructure restructuring:
Smart Routing: The system preprocesses prompts, combining cache hit rates and model pricing to automatically distribute tasks to the most suitable and cost-effective model.
Aggressive Caching: All requests are required to be cache-aware; LibreChat's cache hit rate jumped from 5% to 60%.
Context Pruning: Engineers are advised to start new sessions when switching tasks and narrow file scope to reduce wasted tokens.
Armstrong emphasized that the goal of this cost optimization is not to suppress usage but to expand AI adoption. He stated the aim is to allow engineers to freely use any number of tokens and models without a cost ceiling, while linking usage to business impact. Armstrong believes this model can be referenced by any enterprise; the above are his personal public statements.
GLM 5.2 is the latest model released by Chinese AI company Zhipu AI; Kimi 2.7 is a large language model from Beijing Moonshot AI. Both models are released as open source. Armstrong explained that Coinbase deploys them in routine engineering tasks, while complex tasks still use frontier models.
According to Armstrong, the core of cost reduction is the three-layer infrastructure restructuring: smart routing (automatically assigning tasks to the most cost-effective model), aggressive caching (LibreChat cache hit rate rising from 5% to 60%), and context pruning (reducing wasted tokens). On this basis, using lower-cost Chinese open-source models to replace some routine tasks of U.S. frontier models further compresses overall spending.
According to Armstrong's public statement on June 26, 2026, he did not mention details of data security reviews or compliance arrangements related to the adoption of GLM 5.2 and Kimi 2.7. Coinbase is a U.S.-regulated crypto asset exchange, and the specifics of the relevant compliance framework were not disclosed in this statement.
CZ Interview Analyzes Three Major Reasons for Bear Market, Announces He Will No Longer Lead Crypto Exchange
Token Unlock Schedule Puts $241M Supply in Focus
Token Unlock Schedule Puts $241M Supply in Focus
Chinese AI Model GLM 5.2 Attracts Enterprise Users Seeking Open Alternatives