Google DeepMind Executive: Every AI Product Company Should Build Custom Benchmarks

Gate News message, April 27 — Logan Kilpatrick, senior product manager at Google DeepMind and product lead for Google AI Studio, stated on X that every company building AI-based products should establish its own custom benchmarks to measure AI model performance. He described this as a method to make model improvements “disproportionately benefit your company” and urged founders and business leaders to “start tomorrow.”

Most companies currently rely on public leaderboards to select AI models, but these measure general capabilities that often misalign with specific business scenarios. Kilpatrick cited the example of a contract review company most concerned with clause extraction accuracy—a capability absent from public benchmarks, making it impossible to assess model performance on that task. Custom benchmarks offer two key advantages: first, they enable companies to evaluate each model update against their own business tasks and select the model that performs best in their actual use case rather than the highest-ranked model overall; second, they allow companies to share these test sets with model providers, driving continuous optimization in areas that matter to their business.

Kilpatrick noted that companies like Zapier and Sierra are already implementing this approach, stating that “there is a lot of alpha that can be created here.”

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

IEA: AI infrastructure spending has already surpassed investment in oil and gas production, and is expected to increase another 75% in 2026

According to analysis and market data published by the International Energy Agency (IEA) on April 26, the combined capital expenditures of the world’s top five technology companies in 2025 exceed $400 billion, with most of the spending going toward building AI infrastructure. The scale has already surpassed the annual investment level of global oil and natural gas production. The IEA estimates that the related capital expenditures may further increase by 75% in 2026.

MarketWhisper21m ago

Senator Bernie Sanders Issues Warning on AI's Existential Threat

Sanders stressed that even as most AI scientists acknowledge the possibility of AI escaping control and becoming a danger to our existence, no major measures have been taken to avoid it. “We must make certain that Al benefits humanity, not hurts us,” he stated. Key Takeaways: Bernie Sanders

Coinpedia31m ago

Xiaomi’s AI model lead: As AI competition shifts to the Agent era, self-evolution is a key event on the path to AGI

Xiaomi’s large-model team head, Luo Fuli, accepted an in-depth interview on the Bilibili platform on April 24 (video ID: BV1iVoVBgERD). The interview lasted 3.5 hours, and it was her first time, as the technical head, to publicly and systematically explain her technical viewpoints. Luo Fuli said that the large-model competition track has shifted from the Chat era to the Agent era, and she pointed out that “self-evolution” will be a key event for AGI in the coming year.

MarketWhisper31m ago

xAI Grok Voice takes over Starlink customer service hotline, with 70% of calls automatically closed

According to an official announcement from xAI released on April 23, xAI has launched the Grok Voice Think Fast 1.0 voice AI agent, and it has already been deployed to the Starlink customer service hotline +1 (888) GO STARLINK. According to the test data disclosed in the announcement, 70% of calls are automatically closed by AI, with no human intervention required.

MarketWhisper43m ago

GPT-5.5 Returns to Cutting Edge in Coding, But OpenAI Switches Benchmarks After Losing to Opus 4.7

Gate News message, April 27 — SemiAnalysis, a semiconductor and AI analysis firm, released a comparative benchmark of coding assistants including GPT-5.5, Claude Opus 4.7, and DeepSeek V4. The key finding: GPT-5.5 marks OpenAI's first return to the cutting edge in coding models in six months, with S

GateNews47m ago

MediaTek secures a major order for Google’s 8th-generation TPU! “ASIC” market sentiment catches on, benefiting three concept stocks

MediaTek has entered Google’s 8th-generation TPU training chip supply chain, responsible for the I/O Die design, and adopts TSMC’s N3P and CoWoS-S packaging, showing that it has risen to a high-end AI ASIC design level. It is expected that this year ASIC revenue will surpass $1 billion, and TPU shipment volume could reach tens of millions of units by 2027. Taiwan’s supply chain companies such as Jingyuan Electronics, Chip Implementation Testing, and Hon Hai will also benefit, as the market gradually shifts toward future division of labor and collaboration, as well as chiplet-based heterogeneous integration.

ChainNewsAbmedia2h ago
Comment
0/400
No comments