Founded by former OpenAI executives Mira Murati and John Schulman, Thinking Machines, an AI startup valued at more than $100 million, on Tuesday rolled out a preview of its first full-duplex AI model that can “speak and listen at the same time,” with latency as low as 0.4 seconds, challenging today’s human-AI real-time interaction paradigm.
(NVIDIA invests in Thinking Machines Lab to deploy Vera Rubin, boosting the performance of frontier models)
Thinking Machines’ new model: breaking the old turn-taking pattern
For all current mainstream AI models, they work in the same way: “user input, model waits, then responds.” Mira Murati, former OpenAI CTO and co-founder of OpenAI, and John Schulman, OpenAI co-founder, believe the back-and-forth process is like messaging, not true conversation. Now the Thinking Machines Lab co-founded by the two is officially releasing a new research preview of “Interaction Models” on May 11, aiming to fundamentally change this status quo.
People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action. pic.twitter.com/uxl1InS6Ay
— Thinking Machines (@thinkymachines) May 11, 2026
Thinking Machines says today’s AI models perceive reality in a single thread: if the user hasn’t finished speaking, the model can only wait; if the model hasn’t finished generating, perception freezes. This design becomes a bottleneck for human-AI collaboration, preventing AI teamwork from feeling as natural and fluent as talking with a real person.
The two believe the solution isn’t to patch the old architecture with external components, but to train a model from scratch that natively supports real-time interaction.
Full-duplex architecture: an AI system that can multitask
The model released by Thinking Machines is named TML-Interaction-Small. It is a mixture-of-experts (MoE) model with 276 billion parameters, with 12 billion parameters actually used at run time. The system processes input and generated output in an interleaved manner in 200-millisecond units, with no artificial turn boundaries, truly enabling so-called “full duplex” interaction—like a phone call, not like messaging.
The system uses a dual-model design: the “interaction model” handles real-time dialogue, turn-in and responses; the “background model” performs complex reasoning, web search, and tool calling asynchronously in the background, then seamlessly integrates the results into the ongoing conversation. This allows the AI to quietly complete assigned search or chart-generation tasks while it is speaking or listening.
Benchmark tests: comprehensively outperforming OpenAI and Google
The announcement states that in the standard benchmark FD-bench used to measure AI interaction quality, TML-Interaction-Small’s turn-taking latency is 0.40 seconds, close to human natural conversational response speed, far better than Google Gemini-3.1-flash-live at 0.57 seconds and GPT-realtime-2.0 at 1.18 seconds.
Dark color indicates the best performer in each row; light color indicates the best performer among real-time models
In a proprietary test designed by the team specifically for the new interaction capability, for the “TimeSpeak” task, TML-Interaction-Small’s accuracy is 64.7%, while GPT-realtime-2.0 is only 4.3%. In the “CueSpeak” task (voice trigger), the former reaches 81.7% and the latter only 2.9%. In the “RepCount-A” task (visual counting), the former is 35.4%, while the latter is nearly zero (1.3%).
Thinking Machines points out that no current commercial model can meaningfully complete these tasks, including OpenAI and Google’s higher-end “thinking” models.
Enterprise application potential: from customer service to security monitoring
Beyond a more natural day-to-day conversation experience, the potential value of this technology in enterprise scenarios is also worth attention.
For example, in manufacturing or laboratories, an AI that can instantly monitor video could proactively issue an alert the moment it detects a safety violation, without waiting for staff to ask. In voice customer service, existing systems generally have 1 to 2 seconds of processing delay, which is often users’ most direct pain point; the 0.4-second response speed could fundamentally solve this problem.
At present, TML-Interaction-Small and its accompanying background model are only available to a small number of partner organizations for research preview, and a public version is expected to be released later this year. Thinking Machines also announced it will launch a research grant program to encourage the academic community to develop more new frameworks for evaluating interaction quality.
From talent mobility to stable growth: the next step for Thinking Machines Lab
Thinking Machines Lab, founded in 2025, recently drew external attention earlier this year after multiple founding members left for Meta, and it hired Soumith Chintala, the founder of PyTorch and a senior engineer at Meta, as CTO. Its current employee count has grown to about 130.
(Who is Andrew Tulloch, the one who rejected Meta’s $1.5 billion offer over 15 years and still lost—why did it go wrong?)
In March this year, the company also announced a partnership with Nvidia to deploy at least a 1 gigawatt new-generation Vera Rubin system, and it expanded its partnership with Google Cloud to advance frontier model training and reinforcement learning research.
This article—Thinking Machines, a $100 million startup, publishes an instant-interaction AI model focused on “speaking while listening while doing”—first appeared on Lian News ABMedia.
Related Articles
OpenAI Projected to Save $97 Billion in Microsoft Payments by 2030 Under New Agreement
South Korean Presidential Office Clarifies AI Dividend Comments Were Official's Personal View
Nvidia CEO Huang Emphasizes Blue-Collar Tech Jobs at Carnegie Mellon Graduation
Attackers Hijack TanStack, OpenSearch, Mistral Official Pipelines, Push 84 Malicious Versions on May 12
Ixirpad Partners With Cware Labs to Support AI and Web3 Startups
Claude Code Agent View: single-screen management concurrent sessions