Arcee AI Releases Trinity-Large-Thinking: 398B Open-Source Reasoning Model Under Apache 2.0

Trinity-Large-Thinking is a 398-billion parameter sparse Mixture-of-Experts model with only 13B active parameters per token, released under Apache 2.0 license.
Arcee invested $20 million — nearly half its total funding — in a 33-day pretraining run on 2,048 NVIDIA B300 Blackwell GPUs.
The model scores 94.7% on tau-2-Bench and ranks #2 on PinchBench behind only Claude Opus 4.6, at 96% lower cost ($0.90 vs ~$22.50 per million output tokens).
Trinity-Large-Preview, its predecessor, became the most-used open model on OpenRouter in the US, serving over 80.6 billion tokens on peak days.

What Happened

Arcee AI released Trinity-Large-Thinking on April 1, 2026, a 398-billion parameter open-source reasoning model built for complex, long-horizon agents and multi-turn tool calling. The model uses a sparse Mixture-of-Experts architecture with approximately 13 billion active parameters per token — just 1.56% of its total parameter count active at any given time. It is released under the Apache 2.0 license with full weights available on Hugging Face.

Why It Matters

US-origin open-source frontier models remain rare. Most competitive open-weight models at this scale come from Meta (Llama), Alibaba (Qwen), or DeepSeek. Arcee, described as a small startup, committed $20 million — roughly half its total funding — to a 33-day pretraining run on 2,048 NVIDIA B300 Blackwell GPUs, followed by post-training across 1,152 H100s. The model was trained end-to-end in the United States, a differentiator for enterprises with data sovereignty requirements.

Technical Details

Trinity-Large-Thinking generates explicit reasoning traces in <think>...</think> blocks and supports a 262K-token context window with up to 80K output tokens. On tau-2-Bench, an agentic benchmark, it scores 94.7%. On PinchBench it ranks second overall at 91.9%, behind only Claude Opus 4.6. The model shows improvements over its predecessor Trinity-Large-Preview in multi-turn tool use, context coherence, and instruction following.

At $0.90 per million output tokens via the Arcee API, it is approximately 96% cheaper than Opus 4.6 for inference. The model is also available through OpenRouter and directly via Hugging Face. Trinity-Large-Preview, the prior version, became the most-used open model on OpenRouter in the US, serving over 80.6 billion tokens on peak days.

Who’s Affected

Enterprise teams building multi-agent workflows and tool-calling pipelines are the primary audience. The Apache 2.0 license allows unrestricted commercial use, fine-tuning, and redistribution. Developers currently paying for proprietary reasoning models like Opus 4.6 or GPT-5.2 for agentic tasks can evaluate Trinity-Large-Thinking as a significantly cheaper alternative with competitive benchmark performance.

What’s Next

Arcee plans to release smaller variants optimized for edge deployment. The model’s strong agentic benchmark scores position it for integration into popular agent frameworks. Whether its real-world performance matches the benchmark numbers — particularly in complex, multi-step tool-calling scenarios — will determine its adoption trajectory against established proprietary alternatives.

Arcee AI Releases Trinity-Large-Thinking: 398B Open-Source Reasoning Model Under Apache 2.0

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

DeepMind Study: LLM Rewrites Game Theory Algorithms and Outperforms Human Experts

Self-Distillation Boosts Code Generation by 30%: No Teacher Model or RL Required

Australia and Anthropic Sign AI Safety MOU With AUD$3M Research Investment

Before you go…