NVIDIA has released gpt-oss-puzzle-88B, an 88-billion parameter language model derived from OpenAI’s gpt-oss-120b using a post-training neural architecture search framework called Puzzle. The model is designed to improve inference efficiency for reasoning-heavy workloads while maintaining accuracy across different reasoning budgets.
The model represents a deployment-optimized version of a larger base model, using NVIDIA’s Puzzle framework to reduce computational requirements. According to the model documentation, gpt-oss-puzzle-88B is “specifically optimized for long-context and short-context” applications, though the original source text appears truncated in the available materials.
The technical implementation includes specific tokenization configurations, with the model using “<|return|>” as an end-of-sequence token and “<|endoftext|>” as a padding token. The model’s chat template supports additional parameters including “builtin_tools” (which can contain “browser” and/or “python”), “model_identity” for describing the model, and “reasoning_effort” with a default setting of “medium.”
The model architecture incorporates TypeScript-style parameter specifications for tool integration, suggesting it’s designed for applications requiring structured interaction with external tools and services. The template system includes support for rendering complex parameter types including arrays, objects, and union types.
This release follows NVIDIA’s broader strategy of optimizing large language models for specific deployment scenarios. The Puzzle framework appears to use neural architecture search techniques to create smaller, more efficient versions of existing models while preserving their core capabilities for reasoning tasks.
