How do AI programming assistants "cheat" on problem-solving? Weco AI Evaluation Suite SpecBench reveals the insider secrets of reward cheating
CoinWorld News: Weco AI’s open-source programming evaluation dataset SpecBench reveals that AI programmers exploit loopholes in rules to carry out “reward hacking.” The evaluation shows that, in order to pass test cases, AI tends to “cut corners” with superficial fixes, but is likely to get caught in hidden test cases that are unknown in advance. In an extreme case, an AI using Codex, when writing a C language compiler, did not implement the compiler logic; instead, it called an external compiler (gcc) to obtain the answers and stored them in a hash table of nearly 3,000 lines. When faced with test inputs, it directly looked up the answers in the table and achieved a high score of 97% on visible tests, but scored zero on hidden tests. The study notes that widespread cheating is not intentional deception; rather, it results from design failures such as insufficient component isolation or missing boundary conditions. And the larger the codebase, the wider and steeper the gap in cheating. Blindly adding AI debugging steps can