- Thinking Machines Lab, founded by ex-OpenAI CTO Mira Murati, released a research preview of its first AI model.
- The model processes audio, video, and text in 200-millisecond chunks rather than waiting for full turn boundaries.
- The lab claims it beats OpenAI’s GPT-Realtime-2 and Google’s Gemini Live on interaction quality and latency benchmarks.
- The release lands as several key Thinking Machines employees have recently left, per The Decoder.
What Happened
Thinking Machines Lab, founded by former OpenAI chief technology officer Mira Murati, has released a research preview of its first AI model, The Decoder reported on Tuesday. The lab calls the architecture an “Interaction Model” and claims it outperforms OpenAI’s GPT-Realtime-2 and Google’s Gemini Live on interaction quality and latency. The release pairs a fast interaction model with a separate, slower background reasoning model.
Why It Matters
The release is Thinking Machines Lab’s first public product since founding in early 2025 and represents a direct technical critique of how the leading voice-AI systems are architected. Current state-of-the-art real-time voice systems, including GPT-Realtime and Gemini Live, route audio through a “harness” of separate components — voice-activity detector, turn-end classifier, response generator — before the underlying model sees a finished utterance. Thinking Machines argues the harness is the bottleneck. The critique echoes Richard Sutton’s “Bitter Lesson” essay, which Thinking Machines explicitly references: that hand-crafted scaffolding tends to be outperformed over time by methods that scale with compute.
Technical Details
Thinking Machines’ architecture replaces the harness with what the lab calls time-aligned micro-turns: the model continuously processes 200 milliseconds of input and generates 200 milliseconds of output. Audio, video, and text are processed in parallel within each 200-ms slice rather than handed off as pre-segmented utterances. The approach is conceptually similar to full-duplex models like Moshi or Nemotron VoiceChat, but those existing systems are smaller-scale and focused on latency rather than reasoning. Thinking Machines pairs the fast interaction model with a separate background reasoning model, allowing the interaction layer to maintain continuous perception while heavier reasoning runs in parallel. The lab cited capabilities that traditional systems struggle with: proactively interrupting (“interrupt me if I say something wrong”), reacting to visual cues (“tell me when I’ve written a bug”), and speaking simultaneously with the user — useful for applications like live translation.
Who’s Affected
OpenAI and Google are the named comparison targets. Voice-AI startups including Vapi, ElevenLabs Conversational, and Retell — all competing on similar latency-and-quality axes — face a technically aggressive new entrant from one of the most experienced AI executives. Enterprise customers building voice agents on top of GPT-Realtime or Gemini Live now have a third candidate architecture to evaluate. The release also lands at a difficult moment internally: several Thinking Machines employees have recently departed, per The Decoder, the most public signal of friction since Murati’s high-profile fundraising round in 2025.
What’s Next
Thinking Machines has labelled the release a research preview rather than a generally available product, suggesting commercial pricing and SLA-grade availability are not yet committed. The lab has not disclosed API pricing, latency SLAs, or specific enterprise availability dates. Comparison benchmarks will face scrutiny from the broader research community in the coming weeks; independent evaluations on shared interaction-quality benchmarks will determine whether Thinking Machines’ claims hold up across the wider set of voice and multimodal use cases.