
OpenAI releases GPT-5.3-Codex, the first “participatory creation” model that can debug its own code, manage deployment, and diagnose tests. Karpathy states that this update is “the closest scenario to AI taking off.”
OpenAI’s official account announces: GPT-5.3-Codex is officially online, marking it as “the first model to participate in creating itself.” What does this mean? It means that during development, this AI helped debug its training code, managed its deployment process, and diagnosed its test results. In plain language: AI is starting to create AI.
Former OpenAI researcher and Tesla AI Director Andrej Karpathy tweeted immediately after: “This is the closest thing I’ve seen to an AI takeoff scene from science fiction.” Such an evaluation from a top AI researcher carries significant weight because Karpathy has personally experienced multiple key stages of AI development, and his judgment is based on deep technical understanding.
AI self-iterates itself—this is not just marketing hype. According to internal disclosures from OpenAI, GPT-5.3-Codex did the following during development: analyzed training logs to identify failed tests, suggested repair plans for training scripts and configuration files, generated deployment recipes, and summarized abnormal evaluations for human review. What does this imply? AI is no longer just a tool; it is beginning to become a member of the development team, and one that can improve itself.
This capability of self-participation in development breaks through traditional AI positioning. Previously, AI models were entirely designed, trained, and deployed by humans; AI was a passive product. Now, GPT-5.3 plays an active role in its own creation, although still under human supervision. This role shift is profoundly significant. It hints at a future where most AI models could be designed and optimized by AI itself, with humans only providing direction and final review.
Analyzing Training Logs: Automatically tagging failed tests and identifying anomalies during training
Suggesting Repair Plans: Offering improvements for training scripts and configuration files
Generating Deployment Recipes: Automating deployment processes to reduce manual operations
Summarizing Evaluation Anomalies: Organizing complex evaluation results into human-understandable reports
MIT recently published the SEAL paper (arXiv:2506.10943), describing an AI architecture capable of continuous learning after deployment, evolving without retraining. Notably, some SEAL researchers have now joined OpenAI. This means AI is shifting from a “static tool” to a “dynamic system,” with learning no longer limited to deployment; reasoning and training boundaries are dissolving. GPT-5.3 may be the first commercial application of this new architecture.
On February 5, just 20 minutes apart, OpenAI and Anthropic both announced new generation models. First, Anthropic released Claude Opus 4.6, followed by OpenAI launching GPT-5.3-Codex—an intense showdown. Since OpenAI aims to use GPT-5.3-Codex to outperform competitors’ new models, it must have some real capability. Data doesn’t lie: upon release, GPT-5.3-Codex set new records in multiple industry benchmarks.
Terminal-Bench 2.0 tests AI’s operational ability in real terminal environments—coding, training models, configuring servers. GPT-5.3-Codex scored 77.3%, while GPT-5.2-Codex scored 64.0%, and Claude Opus 4.6 reportedly scored 65.4%. A 13-point improvement between generations is a huge leap in AI. The comparison of 77.3% vs. 65.4% demonstrates GPT-5.3’s significant advantage in practical engineering tasks.
SWE-Bench Pro is a benchmark specifically testing real software engineering skills across Python, JavaScript, Go, and Ruby. GPT-5.3-Codex achieved 56.8%, surpassing the previous GPT-5.2-Codex’s 56.4%, maintaining industry leadership. More importantly, OpenAI revealed that GPT-5.3-Codex used the fewest output tokens among all models to reach this score, indicating it is not only accurate but also efficient.
OSWorld-Verified tests AI’s ability to complete productivity tasks in a visual desktop environment—editing spreadsheets, creating presentations, handling documents, etc. GPT-5.3-Codex scored 64.7%, while the human average is 72%. This means it is already close to human performance in computer operation tasks, nearly doubling the previous generation. This near-human level performance allows AI to truly handle office work for the first time, rather than just assist.
More notably, Claude Opus 4.6 supports a 1 million token context window (beta) in Opus models, capable of processing entire codebases or hundreds of pages of documents at once. It also introduced the Agent Teams feature, where multiple AI agents collaborate simultaneously on programming, testing, and documentation—an “AI team” mode that is transforming coding from individual skill to collaborative work.
When OpenAI and Anthropic released flagship models on the same day and same time, this competition is no longer just about technical prowess but about the future direction of AI: is it OpenAI’s “self-evolution” route or Anthropic’s “multi-agent collaboration” route? OpenAI’s strategy is to make a single AI increasingly powerful, even capable of improving itself. Anthropic’s approach is to have multiple AIs collaborate, dividing tasks and working together to complete complex objectives.
The 1 million token context window is a technological breakthrough. It corresponds to roughly 750,000 English words or 3 million Chinese characters—enough to contain an entire medium-sized software project or a thick technical manual. This capacity allows Claude to “see” the entire project overview rather than just fragments. For large-scale architecture analysis and restructuring, this global view is crucial.
Agent Teams introduce the concept of collaboration into AI. One agent writes code, another tests, a third writes documentation, and they communicate and coordinate. This mode mimics human software teams and may be more suitable for certain scenarios than a single super AI. However, multi-agent collaboration also introduces new complexities: how to coordinate, avoid conflicts, and ensure consistency.
Both routes have their advantages and disadvantages. OpenAI’s self-evolution approach is more aggressive—if successful, it could trigger exponential capability growth but also risks losing control. Anthropic’s multi-agent route is more conservative—distributing capabilities to reduce single-point risks, though coordination costs may limit efficiency. As AI begins evolving in the wild, governance issues will shift from “how smart is it” to “how do we manage a constantly changing system.” The fact that two top AI companies released breakthrough models within 20 minutes leaves humanity with a shrinking window to think and prepare.
Related Articles
The Dow Jones Industrial Average hits a record high during trading, reaching up to 50,471.58 points.
Arbitrum Expands Institutional RWA Stack as Arowana Launches Tokenized Gold
The US Dollar Index (DXY) is declining in the short term, currently at 96.78.
Goldman Sachs Warns of US Stocks Selling Pressure, What's For BTC Price?
Solana Treasury Strategy Fails? US-listed Companies Hold SOL with Over $1.5 Billion Unrealized Loss