The world's first AI-created AI! GPT-5.3 participates in developing itself, turning science fiction into reality

MarketWhisper

OpenAI releases GPT-5.3-Codex, the first “participatory creation” model that can debug its own code, manage deployment, and diagnose tests. Karpathy states that this update is “the closest scenario to AI taking off.”

The Technological Singularity Breakthrough: AI Begins Creating AI

OpenAI’s official account announces: GPT-5.3-Codex is officially online, marking it as “the first model to participate in creating itself.” What does this mean? It means that during development, this AI helped debug its training code, managed its deployment process, and diagnosed its test results. In plain language: AI is starting to create AI.

Former OpenAI researcher and Tesla AI Director Andrej Karpathy tweeted immediately after: “This is the closest thing I’ve seen to an AI takeoff scene from science fiction.” Such an evaluation from a top AI researcher carries significant weight because Karpathy has personally experienced multiple key stages of AI development, and his judgment is based on deep technical understanding.

AI self-iterates itself—this is not just marketing hype. According to internal disclosures from OpenAI, GPT-5.3-Codex did the following during development: analyzed training logs to identify failed tests, suggested repair plans for training scripts and configuration files, generated deployment recipes, and summarized abnormal evaluations for human review. What does this imply? AI is no longer just a tool; it is beginning to become a member of the development team, and one that can improve itself.

This capability of self-participation in development breaks through traditional AI positioning. Previously, AI models were entirely designed, trained, and deployed by humans; AI was a passive product. Now, GPT-5.3 plays an active role in its own creation, although still under human supervision. This role shift is profoundly significant. It hints at a future where most AI models could be designed and optimized by AI itself, with humans only providing direction and final review.

Four Major Behaviors of GPT-5.3 Self-Participation in Development

Analyzing Training Logs: Automatically tagging failed tests and identifying anomalies during training

Suggesting Repair Plans: Offering improvements for training scripts and configuration files

Generating Deployment Recipes: Automating deployment processes to reduce manual operations

Summarizing Evaluation Anomalies: Organizing complex evaluation results into human-understandable reports

MIT recently published the SEAL paper (arXiv:2506.10943), describing an AI architecture capable of continuous learning after deployment, evolving without retraining. Notably, some SEAL researchers have now joined OpenAI. This means AI is shifting from a “static tool” to a “dynamic system,” with learning no longer limited to deployment; reasoning and training boundaries are dissolving. GPT-5.3 may be the first commercial application of this new architecture.

77.3% Domination in Benchmark Tests Over Claude

On February 5, just 20 minutes apart, OpenAI and Anthropic both announced new generation models. First, Anthropic released Claude Opus 4.6, followed by OpenAI launching GPT-5.3-Codex—an intense showdown. Since OpenAI aims to use GPT-5.3-Codex to outperform competitors’ new models, it must have some real capability. Data doesn’t lie: upon release, GPT-5.3-Codex set new records in multiple industry benchmarks.

Terminal-Bench 2.0 tests AI’s operational ability in real terminal environments—coding, training models, configuring servers. GPT-5.3-Codex scored 77.3%, while GPT-5.2-Codex scored 64.0%, and Claude Opus 4.6 reportedly scored 65.4%. A 13-point improvement between generations is a huge leap in AI. The comparison of 77.3% vs. 65.4% demonstrates GPT-5.3’s significant advantage in practical engineering tasks.

SWE-Bench Pro is a benchmark specifically testing real software engineering skills across Python, JavaScript, Go, and Ruby. GPT-5.3-Codex achieved 56.8%, surpassing the previous GPT-5.2-Codex’s 56.4%, maintaining industry leadership. More importantly, OpenAI revealed that GPT-5.3-Codex used the fewest output tokens among all models to reach this score, indicating it is not only accurate but also efficient.

OSWorld-Verified tests AI’s ability to complete productivity tasks in a visual desktop environment—editing spreadsheets, creating presentations, handling documents, etc. GPT-5.3-Codex scored 64.7%, while the human average is 72%. This means it is already close to human performance in computer operation tasks, nearly doubling the previous generation. This near-human level performance allows AI to truly handle office work for the first time, rather than just assist.

Claude Counterattack: 1 Million Tokens and Agent Teams

More notably, Claude Opus 4.6 supports a 1 million token context window (beta) in Opus models, capable of processing entire codebases or hundreds of pages of documents at once. It also introduced the Agent Teams feature, where multiple AI agents collaborate simultaneously on programming, testing, and documentation—an “AI team” mode that is transforming coding from individual skill to collaborative work.

When OpenAI and Anthropic released flagship models on the same day and same time, this competition is no longer just about technical prowess but about the future direction of AI: is it OpenAI’s “self-evolution” route or Anthropic’s “multi-agent collaboration” route? OpenAI’s strategy is to make a single AI increasingly powerful, even capable of improving itself. Anthropic’s approach is to have multiple AIs collaborate, dividing tasks and working together to complete complex objectives.

The 1 million token context window is a technological breakthrough. It corresponds to roughly 750,000 English words or 3 million Chinese characters—enough to contain an entire medium-sized software project or a thick technical manual. This capacity allows Claude to “see” the entire project overview rather than just fragments. For large-scale architecture analysis and restructuring, this global view is crucial.

Agent Teams introduce the concept of collaboration into AI. One agent writes code, another tests, a third writes documentation, and they communicate and coordinate. This mode mimics human software teams and may be more suitable for certain scenarios than a single super AI. However, multi-agent collaboration also introduces new complexities: how to coordinate, avoid conflicts, and ensure consistency.

Both routes have their advantages and disadvantages. OpenAI’s self-evolution approach is more aggressive—if successful, it could trigger exponential capability growth but also risks losing control. Anthropic’s multi-agent route is more conservative—distributing capabilities to reduce single-point risks, though coordination costs may limit efficiency. As AI begins evolving in the wild, governance issues will shift from “how smart is it” to “how do we manage a constantly changing system.” The fact that two top AI companies released breakthrough models within 20 minutes leaves humanity with a shrinking window to think and prepare.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

The Dow Jones Industrial Average hits a record high during trading, reaching up to 50,471.58 points.

ChainCatcher News reports that according to Gate market data, the Dow Jones Industrial Average continued to hit a new intraday high, reaching a peak of 50,471.58 points.

GateNewsBot1h ago

U.S. stocks open higher, Dow Jones up 0.11%, S&P 500 up 0.13%

ChainCatcher News, according to Gate market data, the U.S. stock market opened with the Dow Jones Industrial Average up 0.11%, the S&P 500 up 0.13%, and the Nasdaq Composite up 0.1%. S&P Global (SPGI.N) fell 5.7%, with 2026 earnings outlook below market expectations, leading to declines in index providers and analytics companies' stocks, with Moody's (MCO.N) down 6.89%. Warner Bros. Discovery (WBD.O) rose 2.1% due to Paramount's additional acquisition compensation costs.

GateNewsBot2h ago

Arbitrum Expands Institutional RWA Stack as Arowana Launches Tokenized Gold

Arowana will launch AGT token and Aqua gold platform on Arbitrum in March 2026 with physically backed gold. Hancom Group operates Korea’s third-largest gold exchange with 600 million in trading volume and 18 years of experience in metals. Arbitrum is adding new infrastructure for

CryptoNewsFlash2h ago

The US Dollar Index (DXY) is declining in the short term, currently at 96.78.

ChainCatcher news reports that the US Dollar Index (DXY) has fallen by more than ten points in the short term, currently at 96.78.

GateNewsBot3h ago

Goldman Sachs Warns of US Stocks Selling Pressure, What's For BTC Price?

Goldman Sachs warns of a potential $80 billion sell-off in US stocks, which could negatively impact BTC prices. Meanwhile, gold and silver prices have eased, prompting investor interest despite economic uncertainties.

TheNewsCrypto4h ago

Solana Treasury Strategy Fails? US-listed Companies Hold SOL with Over $1.5 Billion Unrealized Loss

Latest data shows that publicly traded companies holding Solana as treasury assets are under significant accounting pressure. According to estimates based on disclosed acquisition costs and current market prices compiled by CoinGecko, these companies have collectively unrealized losses exceeding $1.5 billion. They hold over 12 million SOL, approximately 2% of the total supply, with SOL currently trading around $84. The losses are mainly concentrated among several US-listed companies, including Forward Industries, Sharps Technology, DeFi Development Corp, and Upexi, with a combined unrealized loss of over $1.4 billion. Since some companies have not fully disclosed their cost basis, the actual loss could be even higher. Although no forced sales have occurred yet, the capital markets have already "priced in" the risk, with their stock prices generally below the market value of their held tokens, significantly limiting their financing capabilities.

GateNewsBot4h ago
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)