Top global large models can't pass "Pokémon": These games are AI's nightmare
Author: Guo Xiaojing, Tencent Technology
Editor | Xu Qingyang
Top AI models in the world can pass medical licensing exams, write complex code, and even beat human experts in math competitions, but they repeatedly struggle in a children's game called "Pokémon."
This eye-catching attempt began in February 2025, when a researcher from Anthropic launched a Twitch live stream titled “Claude Plays Pokémon Red,” to coincide with the release of Claude Sonnet 3.7.
2,000 viewers flooded into the live stream. In the public chat, viewers brainstormed and cheered for Claude, gradually transforming the broadcast into a public observation of AI capabilities.
Sonnet 3.7 can only be described as “playing” Pokémon, but “playing” does not mean “winning.” It gets stuck for dozens of hours at critical points.