Murati's TML Ships First Model with 200ms Layer

Thinking Machines Lab, founded by ex-OpenAI CTO Mira Murati, released a research preview of its first AI model.
The model processes audio, video, and text in 200-millisecond chunks rather than waiting for full turn boundaries.
The lab claims it beats OpenAI’s GPT-Realtime-2 and Google’s Gemini Live on interaction quality and latency benchmarks.
The release lands as several key Thinking Machines employees have recently left, per The Decoder.

What Happened

Thinking Machines Lab, founded by former OpenAI chief technology officer Mira Murati, has released a research preview of its first AI model, The Decoder reported on Tuesday. The lab calls the architecture an “Interaction Model” and claims it outperforms OpenAI’s GPT-Realtime-2 and Google’s Gemini Live on interaction quality and latency. The release pairs a fast interaction model with a separate, slower background reasoning model.

Why It Matters

The release is Thinking Machines Lab’s first public product since founding in early 2025 and represents a direct technical critique of how the leading voice-AI systems are architected. Current state-of-the-art real-time voice systems, including GPT-Realtime and Gemini Live, route audio through a “harness” of separate components — voice-activity detector, turn-end classifier, response generator — before the underlying model sees a finished utterance. Thinking Machines argues the harness is the bottleneck. The critique echoes Richard Sutton’s “Bitter Lesson” essay, which Thinking Machines explicitly references: that hand-crafted scaffolding tends to be outperformed over time by methods that scale with compute.

Technical Details

Thinking Machines’ architecture replaces the harness with what the lab calls time-aligned micro-turns: the model continuously processes 200 milliseconds of input and generates 200 milliseconds of output. Audio, video, and text are processed in parallel within each 200-ms slice rather than handed off as pre-segmented utterances. The approach is conceptually similar to full-duplex models like Moshi or Nemotron VoiceChat, but those existing systems are smaller-scale and focused on latency rather than reasoning. Thinking Machines pairs the fast interaction model with a separate background reasoning model, allowing the interaction layer to maintain continuous perception while heavier reasoning runs in parallel. The lab cited capabilities that traditional systems struggle with: proactively interrupting (“interrupt me if I say something wrong”), reacting to visual cues (“tell me when I’ve written a bug”), and speaking simultaneously with the user — useful for applications like live translation.

Who’s Affected

OpenAI and Google are the named comparison targets. Voice-AI startups including Vapi, ElevenLabs Conversational, and Retell — all competing on similar latency-and-quality axes — face a technically aggressive new entrant from one of the most experienced AI executives. Enterprise customers building voice agents on top of GPT-Realtime or Gemini Live now have a third candidate architecture to evaluate. The release also lands at a difficult moment internally: several Thinking Machines employees have recently departed, per The Decoder, the most public signal of friction since Murati’s high-profile fundraising round in 2025.

What’s Next

Thinking Machines has labelled the release a research preview rather than a generally available product, suggesting commercial pricing and SLA-grade availability are not yet committed. The lab has not disclosed API pricing, latency SLAs, or specific enterprise availability dates. Comparison benchmarks will face scrutiny from the broader research community in the coming weeks; independent evaluations on shared interaction-quality benchmarks will determine whether Thinking Machines’ claims hold up across the wider set of voice and multimodal use cases.

Mira Murati’s Thinking Machines Lab Ships First Model with 200ms Interaction Layer

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

Mira Murati’s Thinking Machines Lab Ships First Model with 200ms Interaction Layer

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

Notion Becomes AI Agent Hub with Custom Workers, External Agent Connectors

Anthropic Launches Claude for Small Business with 15 Workflows Across QuickBooks, PayPal, HubSpot

NVIDIA Releases cuda-oxide: Experimental Rust-to-CUDA Compiler Backend Compiling SIMT Kernels Directly to PTX