OpenAI and Paradigm Launch EVMbench for Ethereum Security

ETH-3.72%
  • OpenAI and Paradigm built EVMbench from 120 real audit vulnerabilities.

  • Benchmark tests AI in detect, patch, and exploit modes using sandboxed EVM environments.

  • GPT-5.3-Codex scored 72.2% in exploit mode, outperforming earlier GPT-5 results.

OpenAI, working with Paradigm, unveiled a new benchmark to test AI performance on Ethereum smart contract security. The release, announced this week, introduced EVMbench as a way to measure how AI agents detect, patch, and exploit contract flaws. The effort targets rising risks, as smart contracts secure over $100 billion in crypto assets across EVM networks.

Benchmark Built From Real-World Audit Failures

According to OpenAI, EVMbench draws from 120 high-severity vulnerabilities identified across 40 professional smart contract audits. Notably, many of these issues originated from open audit competitions, including Code4rena. The benchmark focuses on real bugs rather than synthetic examples.

In addition, OpenAI said the dataset includes scenarios linked to security work on the Tempo chain. Tempo operates as a payment-focused Layer-1 network built for stablecoin transfers. Because of that, these cases introduce payment logic risks into the benchmark environment.

To support realistic testing, engineers reused exploit proof-of-concept scripts where available. However, they manually built missing components when documentation proved incomplete. OpenAI said it preserved exploitability while ensuring patches could compile correctly.

Three Testing Modes Stress AI Agents

EVMbench evaluates agents in detect, patch, and exploit modes. In detect mode, agents scan repositories and receive scores based on confirmed vulnerability recall. In patch mode, agents must fix flaws while preserving original contract behavior.

Exploit mode, however, simulates full fund-draining attacks within a sandbox blockchain. OpenAI said graders confirm outcomes through transaction replay and on-chain state checks. To ensure consistency, the company built a Rust-based harness for deterministic deployments.

The exploit tests run in a local Anvil environment, not live networks. OpenAI noted that all vulnerabilities are historical and publicly disclosed. Additionally, the harness restricts unsafe RPC calls to reduce misuse.

Results and Team Expansion

In reported results, GPT-5.3-Codex achieved a 72.2% score in exploit mode. By comparison, GPT-5 reached 31.9%, despite launching months earlier. However, OpenAI said detection and patch coverage remains incomplete.

Alongside EVMbench, OpenAI confirmed a key hire. Peter Steinberger, founder of OpenClaw, joined the company to work on agent development. Sam Altman confirmed the move on X, noting Steinberger will lead next-generation personal agent projects.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

The Ethereum platform team has officially been established to strengthen the collaboration between L1 and L2.

The Ethereum Foundation has established the Ethereum Platform Team, aimed at improving user support and application integration for L1 and L2. The team will focus on protocol development, technical integration, and strategic tracking, evaluating the successes and shortcomings of the Ethereum system, and promoting the value enhancement and widespread adoption of L2.

GateNewsBot1h ago

OpenAI Launches "EVMbench": Testing AI's Ability to "Ensure Smart Contract Security"

As the security risks in cryptocurrency continue to rise, OpenAI is officially entering the blockchain security field. Led by CEO Sam Altman, OpenAI announced the launch of a new testing framework called "EVMbench," aimed at evaluating whether artificial intelligence has the practical ability to "understand, detect, and even repair" vulnerabilities in cryptocurrency smart contracts. OpenAI stated that EVMbench will focus on the security issues of smart contracts on Ethereum and Ethereum Virtual Machine (EVM) compatible chains. The ultimate goal is to establish a quantifiable and comparable evaluation standard for AI systems in the field of blockchain security. The so-called "smart contracts" refer to self-executing code deployed on the blockchain, widely supporting decentralized exchanges (DEX), lending protocols, derivatives protocols, and various on-chain financial applications. However, once these contracts are deployed, they are vulnerable to...

区块客2h ago

Ethereum to Integrate ERC-5564 in Push for Privacy - U.Today

The Ethereum network is introducing stealth addresses through ERC-5564 to enhance wallet privacy, allowing users to receive anonymous payments and keep their transaction history confidential. However, challenges like gas fee requirements and potential spam risks remain.

UToday3h ago

U.S. Spot Bitcoin and Ethereum ETFs Post Outflows, Solana ETFs See Inflows

U.S.-based Bitcoin and Ethereum ETFs experienced significant outflows totaling $175.1 million, primarily driven by BlackRock funds. In contrast, Solana ETFs saw small inflows, while XRP ETFs continued to see withdrawals.

TheNewsCrypto3h ago

Morgan Stanley increased its exposure to Bitmine holdings last quarter

Despite the overall decline in the crypto market, the largest shareholders of Bitmine Immersion Technologies, Morgan Stanley and ARK Investment Management, increased their holdings in Q4 2025, with market values reaching $331 million and $256 million respectively.

GateNewsBot4h ago
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)