GigaChat 3.1 Ultra: 702B Open-Weight Model, MIT License

Sber (Sberbank), Russia’s largest bank, released GigaChat 3.1 Ultra — a 702-billion-parameter Mixture-of-Experts model with 36 billion active parameters — under an MIT open-source license on Hugging Face.
The model outperforms DeepSeek-V3-0324 and non-reasoning Qwen3-235B on mathematics and general reasoning benchmarks, scoring an overall mean of 0.6764 versus DeepSeek’s 0.6482.
GigaChat 3.1 Ultra was trained on approximately 5.5 trillion synthetic tokens across 10 languages, using native FP8 precision for its DPO (Direct Preference Optimization) stage rather than post-training quantization.
A smaller companion model, GigaChat 3.1 Lightning (10 billion parameters), performs at GPT-4o level according to Sber’s benchmarks.

What Happened

Sber, Russia’s largest financial institution and a major technology conglomerate, released GigaChat 3.1 Ultra on Hugging Face in late March 2026. The model is a 702-billion-parameter Mixture-of-Experts (MoE) architecture with 36 billion active parameters at inference time, released under an MIT license alongside a smaller companion model, GigaChat 3.1 Lightning, with 10 billion parameters.

The release makes GigaChat 3.1 Ultra one of the largest open-weight models available, matching DeepSeek-V3 in total parameter count while using a similar MoE approach to keep inference costs manageable. Sber claims the Ultra model generates text twice as fast as its previous flagship thanks to the MoE architecture.

Why It Matters

GigaChat 3.1 Ultra represents the most capable AI model to emerge from Russia’s domestic technology sector. While American and Chinese labs have dominated the frontier model landscape, Sber’s release demonstrates that Russian institutions can produce competitive open-weight models at the 700B+ parameter scale.

The MIT license is significant. Unlike some open-weight releases that restrict commercial use, the MIT license allows anyone to use, modify, and deploy GigaChat 3.1 for any purpose, including commercial applications. This positions the model as a direct competitor to DeepSeek-V3 and Qwen3 in the open-weight space.

Technical Details

The architecture combines three key innovations. Multi-head Latent Attention (MLA) compresses the key-value cache into a latent representation, reducing memory usage during inference — particularly beneficial for long-context workloads. Multi-Token Prediction (MTP) predicts multiple tokens per forward pass, enabling speculative and parallel decoding in production. The MoE layer routes inputs to specialized expert networks, activating only 36 billion of the 702 billion total parameters per token.

Training used approximately 5.5 trillion synthetic tokens spanning 10 languages, with data sources including books, academic material, code datasets, and mathematics problems. Sber generated millions of synthetic math and olympiad-style programming tasks to strengthen reasoning capabilities. The 3.1 release expanded coverage into hard domains including finance, physics, engineering, biology, chemistry, and medicine.

The post-training pipeline is notable for running DPO (Direct Preference Optimization) in native FP8 precision rather than quantizing a higher-precision model after training. According to Sber, this approach “recovered quality while using substantially less memory,” and MTP heads were trained during the DPO stage for consistency.

Who’s Affected

Researchers and companies looking for open-weight alternatives to proprietary models gain another option. GigaChat 3.1 Ultra scores an overall mean of 0.6764 across benchmarks, compared to 0.6482 for DeepSeek-V3, with particular strengths in Russian-language tasks (MMLU RU: 0.8267) and mathematics (T-Math: 0.2961 versus DeepSeek’s 0.1450). On Arena Hard evaluations, Ultra scored 90.2 versus DeepSeek-V3’s 80.1.

The Lightning companion model targets resource-constrained deployments, performing at what Sber claims is GPT-4o level with only 1.8 billion active parameters — making it viable for on-premise and edge deployments. Russian-language users benefit most, as GigaChat 3.1 Ultra shows particular strength on Russian benchmarks: MMLU RU at 0.8267 versus DeepSeek’s 0.8392, and Arena Hard RU at 82.1 versus 70.7.

What’s Next

Independent benchmark verification will be critical. Sber’s published scores show competitive or superior performance to DeepSeek-V3 and Qwen3-235B, but these are self-reported results. The model is available in FP8, BF16, and GGUF formats with support for vLLM, SGLang, LMDeploy, and TensorRT-LLM inference engines, so independent testing should follow quickly.

Deployment requires multi-node GPU clusters — the reference configuration uses 2 nodes with 16 GPUs each — which limits accessibility to organizations with significant compute resources. The geopolitical context also matters: Western sanctions on Russia have restricted access to advanced Nvidia GPUs, making Sber’s ability to train a 702B-parameter model a notable technical achievement under constrained hardware conditions. Whether international researchers and companies will adopt a Russian-developed model given current geopolitical tensions remains an open question.

GigaChat 3.1 Ultra: 702B Open-Weight Model, MIT License

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Notion Becomes AI Agent Hub with Custom Workers, External Agent Connectors

Anthropic Launches Claude for Small Business with 15 Workflows Across QuickBooks, PayPal, HubSpot

Mira Murati’s Thinking Machines Lab Ships First Model with 200ms Interaction Layer