Founded by former OpenAI executives Mira Murati and John Schulman, Thinking Machines, an AI startup valued at more than $100 million, on Tuesday rolled out a preview of its first full-duplex AI model that can “speak and listen at the same time,” with latency as low as 0.4 seconds, challenging today’s human-AI real-time interaction paradigm.

(NVIDIA invests in Thinking Machines Lab to deploy Vera Rubin, boosting the performance of frontier models)

Thinking Machines’ new model: breaking the old turn-taking pattern

For all current mainstream AI models, they work in the same way: “user input, model waits, then responds.” Mira Murati, former OpenAI CTO and co-founder of OpenAI, and John Schulman, OpenAI co-founder, believe the back-and-forth process is like messaging, not true conversation. Now the Thinking Machines Lab co-founded by the two is officially releasing a new research preview of “Interaction Models” on May 11, aiming to fundamentally change this status quo.

People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way.

We share our approach, early results, and a quick look at our model in action. pic.twitter.com/uxl1InS6Ay

— Thinking Machines (@thinkymachines) May 11, 2026

Thinking Machines says today’s AI models perceive reality in a single thread: if the user hasn’t finished speaking, the model can only wait; if the model hasn’t finished generating, perception freezes. This design becomes a bottleneck for human-AI collaboration, preventing AI teamwork from feeling as natural and fluent as talking with a real person.

The two believe the solution isn’t to patch the old architecture with external components, but to train a model from scratch that natively supports real-time interaction.

Full-duplex architecture: an AI system that can multitask

The model released by Thinking Machines is named TML-Interaction-Small. It is a mixture-of-experts (MoE) model with 276 billion parameters, with 12 billion parameters actually used at run time. The system processes input and generated output in an interleaved manner in 200-millisecond units, with no artificial turn boundaries, truly enabling so-called “full duplex” interaction—like a phone call, not like messaging.

The system uses a dual-model design: the “interaction model” handles real-time dialogue, turn-in and responses; the “background model” performs complex reasoning, web search, and tool calling asynchronously in the background, then seamlessly integrates the results into the ongoing conversation. This allows the AI to quietly complete assigned search or chart-generation tasks while it is speaking or listening.

Benchmark tests: comprehensively outperforming OpenAI and Google

The announcement states that in the standard benchmark FD-bench used to measure AI interaction quality, TML-Interaction-Small’s turn-taking latency is 0.40 seconds, close to human natural conversational response speed, far better than Google Gemini-3.1-flash-live at 0.57 seconds and GPT-realtime-2.0 at 1.18 seconds.

Dark color indicates the best performer in each row; light color indicates the best performer among real-time models

In a proprietary test designed by the team specifically for the new interaction capability, for the “TimeSpeak” task, TML-Interaction-Small’s accuracy is 64.7%, while GPT-realtime-2.0 is only 4.3%. In the “CueSpeak” task (voice trigger), the former reaches 81.7% and the latter only 2.9%. In the “RepCount-A” task (visual counting), the former is 35.4%, while the latter is nearly zero (1.3%).

Thinking Machines points out that no current commercial model can meaningfully complete these tasks, including OpenAI and Google’s higher-end “thinking” models.

Enterprise application potential: from customer service to security monitoring

Beyond a more natural day-to-day conversation experience, the potential value of this technology in enterprise scenarios is also worth attention.

For example, in manufacturing or laboratories, an AI that can instantly monitor video could proactively issue an alert the moment it detects a safety violation, without waiting for staff to ask. In voice customer service, existing systems generally have 1 to 2 seconds of processing delay, which is often users’ most direct pain point; the 0.4-second response speed could fundamentally solve this problem.

At present, TML-Interaction-Small and its accompanying background model are only available to a small number of partner organizations for research preview, and a public version is expected to be released later this year. Thinking Machines also announced it will launch a research grant program to encourage the academic community to develop more new frameworks for evaluating interaction quality.

From talent mobility to stable growth: the next step for Thinking Machines Lab

Thinking Machines Lab, founded in 2025, recently drew external attention earlier this year after multiple founding members left for Meta, and it hired Soumith Chintala, the founder of PyTorch and a senior engineer at Meta, as CTO. Its current employee count has grown to about 130.

(Who is Andrew Tulloch, the one who rejected Meta’s $1.5 billion offer over 15 years and still lost—why did it go wrong?)

In March this year, the company also announced a partnership with Nvidia to deploy at least a 1 gigawatt new-generation Vera Rubin system, and it expanded its partnership with Google Cloud to advance frontier model training and reinforcement learning research.

This article—Thinking Machines, a $100 million startup, publishes an instant-interaction AI model focused on “speaking while listening while doing”—first appeared on Lian News ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

OpenAI Projected to Save $97 Billion in Microsoft Payments by 2030 Under New Agreement

AI Industry News

According to The Information, OpenAI is projected to save approximately $97 billion in payments to Microsoft by 2030 under the revised partnership agreement negotiated in October last year. CFO Sarah Friar told investors the company expects to share 8-10% of revenue with all commercial partners,

GateNews7m ago

South Korean Presidential Office Clarifies AI Dividend Comments Were Official's Personal View

AI Industry News

According to Glontech, a South Korean presidential office official stated today that remarks by Presidential Policy Director Kim Yong-beom regarding using artificial intelligence revenues to distribute a 'citizens dividend' represent only his personal opinion. The official noted that Kim's related s

GateNews15m ago

Nvidia CEO Huang Emphasizes Blue-Collar Tech Jobs at Carnegie Mellon Graduation

AI Industry News

According to his speech on Sunday at Carnegie Mellon University's 2026 graduation ceremony, Nvidia CEO Jensen Huang emphasized that electricians, plumbers, ironworkers, and construction workers are positioned to benefit most from the artificial intelligence boom. "AI gives America a chance to

GateNews42m ago

Attackers Hijack TanStack, OpenSearch, Mistral Official Pipelines, Push 84 Malicious Versions on May 12

AI Industry News

According to Beating's monitoring, on May 12 at 3:20–3:26 UTC+8, attackers affiliated with TeamPCP hijacked the official release pipelines of TanStack, Amazon's OpenSearch, and Mistral, pushing 84 malicious package versions across npm and PyPI. Affected packages include @tanstack/react-router (10M+

GateNews1h ago

Ixirpad Partners With Cware Labs to Support AI and Web3 Startups

Partnerships & Ecosystem AI Industry News

According to an announcement on May 11, Ixirpad entered into a strategic partnership with Cware Labs to accelerate sustainable infrastructure development in the Web3 industry. Cware Labs, operating as a venture studio, will identify and support high-potential blockchain and AI projects. The

GateNews1h ago

Claude Code Agent View: single-screen management concurrent sessions

AI Industry News

Anthropic on May 11 introduced a new feature for Claude Code called “Agent View,” which consolidates multiple simultaneously running Claude Code work sessions into a single screen for management, eliminating the need to switch back and forth between multiple terminal tabs. According to Anthropic’s official blog, the feature is rolling out in a Research Preview format and is available for Pro, Max, Team, Enterprise, and Claude API solutions. A single post on the official X account has received mo

ChainNewsAbmedia2h ago

Comment

0/400

No comments