ANALYSIS

Arcee AI Releases Trinity-Large-Thinking: 398B Open-Source Reasoning Model Under Apache 2.0

N Nikhil B Apr 4, 2026 2 min read
Engine Score 7/10 — Important
Editorial illustration for: Arcee AI Releases Trinity-Large-Thinking: 398B Open-Source Reasoning Model Under Apache 2.0
  • Trinity-Large-Thinking is a 398-billion parameter sparse Mixture-of-Experts model with only 13B active parameters per token, released under Apache 2.0 license.
  • Arcee invested $20 million — nearly half its total funding — in a 33-day pretraining run on 2,048 NVIDIA B300 Blackwell GPUs.
  • The model scores 94.7% on tau-2-Bench and ranks #2 on PinchBench behind only Claude Opus 4.6, at 96% lower cost ($0.90 vs ~$22.50 per million output tokens).
  • Trinity-Large-Preview, its predecessor, became the most-used open model on OpenRouter in the US, serving over 80.6 billion tokens on peak days.

What Happened

Arcee AI released Trinity-Large-Thinking on April 1, 2026, a 398-billion parameter open-source reasoning model built for complex, long-horizon agents and multi-turn tool calling. The model uses a sparse Mixture-of-Experts architecture with approximately 13 billion active parameters per token — just 1.56% of its total parameter count active at any given time. It is released under the Apache 2.0 license with full weights available on Hugging Face.

Why It Matters

US-origin open-source frontier models remain rare. Most competitive open-weight models at this scale come from Meta (Llama), Alibaba (Qwen), or DeepSeek. Arcee, described as a small startup, committed $20 million — roughly half its total funding — to a 33-day pretraining run on 2,048 NVIDIA B300 Blackwell GPUs, followed by post-training across 1,152 H100s. The model was trained end-to-end in the United States, a differentiator for enterprises with data sovereignty requirements.

Technical Details

Trinity-Large-Thinking generates explicit reasoning traces in <think>...</think> blocks and supports a 262K-token context window with up to 80K output tokens. On tau-2-Bench, an agentic benchmark, it scores 94.7%. On PinchBench it ranks second overall at 91.9%, behind only Claude Opus 4.6. The model shows improvements over its predecessor Trinity-Large-Preview in multi-turn tool use, context coherence, and instruction following.

At $0.90 per million output tokens via the Arcee API, it is approximately 96% cheaper than Opus 4.6 for inference. The model is also available through OpenRouter and directly via Hugging Face. Trinity-Large-Preview, the prior version, became the most-used open model on OpenRouter in the US, serving over 80.6 billion tokens on peak days.

Who’s Affected

Enterprise teams building multi-agent workflows and tool-calling pipelines are the primary audience. The Apache 2.0 license allows unrestricted commercial use, fine-tuning, and redistribution. Developers currently paying for proprietary reasoning models like Opus 4.6 or GPT-5.2 for agentic tasks can evaluate Trinity-Large-Thinking as a significantly cheaper alternative with competitive benchmark performance.

What’s Next

Arcee plans to release smaller variants optimized for edge deployment. The model’s strong agentic benchmark scores position it for integration into popular agent frameworks. Whether its real-world performance matches the benchmark numbers — particularly in complex, multi-step tool-calling scenarios — will determine its adoption trajectory against established proprietary alternatives.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

NB
Nikhil B

Founder of MegaOne AI. Covers AI industry developments, tool launches, funding rounds, and regulation changes. Every story is sourced from primary documents, fact-checked, and rated using the six-factor Engine Score methodology.

About Us Editorial Policy