Laguna M.1 & XS.2: Top Open-Weight Coding Models

Poolside AI released Laguna M.1 (225B/23B-active) and Laguna XS.2 (33B/3B-active) on April 28, 2026 — both Mixture-of-Experts agentic coding models trained from scratch by the company.
Laguna M.1 reaches 72.5% on SWE-bench Verified; Laguna XS.2 reaches 68.2% on SWE-bench Verified and is Poolside’s first open-weight release.
Laguna XS.2 is small enough to run on a Mac with 36 GB of RAM via Ollama, targeting on-device agentic coding workflows.
Poolside is also releasing “pool,” a terminal-based coding agent with a dual Agent Client Protocol (ACP) client-server, the same harness used internally for reinforcement-learning training.

What Happened

Poolside AI released the first two models in its Laguna family on April 28, 2026: Laguna M.1, a 225-billion-parameter Mixture-of-Experts model, and Laguna XS.2, a 33-billion-parameter open-weight version. The company is also shipping “pool,” a terminal-based coding agent with a dual Agent Client Protocol (ACP) client-server, as a research preview. The release was reported by MarkTechPost editor Asif Razzaq.

Why It Matters

SWE-bench Verified is the most cited benchmark for agentic coding ability on real-world GitHub issues. Anthropic’s Claude Sonnet 4.5 and OpenAI’s frontier coding models have set the high bar in 2025–2026, but their weights are closed. An open-weight model that scores 68.2% on Verified is competitive with closed-source frontier models from a year ago and gives developers a self-hosted option. Poolside has so far been quieter than rivals like Cursor and Anysphere, and Laguna positions it directly against the agentic-coding cohort.

Technical Details

Both Laguna models use a Mixture-of-Experts architecture: each token is routed through a small subset of “experts” rather than the full parameter count, so inference cost scales with activated parameters rather than total parameters. Laguna M.1 has 225 billion total parameters with 23 billion activated per token, was trained on 30 trillion tokens, and used 6,144 NVIDIA Hopper GPUs for pre-training, which the company says completed at the end of 2025. Laguna XS.2 has 33 billion total parameters with 3 billion activated per token and is the company’s first open-weight release.

Across the SWE-bench family, Laguna M.1 scored 72.5% on Verified, 67.3% on Multilingual, 46.9% on Pro, and 40.7% on Terminal-Bench 2.0. Laguna XS.2 scored 68.2% on Verified, 62.4% on Multilingual, 44.5% on Pro, and 30.1% on Terminal-Bench 2.0. Poolside states that XS.2 is compact enough to run on a 36 GB Apple Silicon Mac via Ollama.

Who’s Affected

Developers building agentic coding workflows on private codebases are the primary audience: an open-weight model strong enough to drive a SWE-bench-style agent removes a hard dependency on closed APIs from Anthropic, OpenAI, and others. Anysphere (maker of Cursor), Continue.dev, Aider, and the broader open-source agentic-coding stack now have a credible self-hostable backbone. Poolside itself is building toward a vertical agentic-coding product, and Laguna is the foundation it intends to ship on.

What’s Next

Poolside has flagged that more Laguna variants will follow — XS.2 is described as a second-generation MoE built on lessons from training M.1. The “pool” agent and ACP client-server are in research preview, and Poolside has said the same harness will eventually be released as the company scales agent reinforcement-learning training. Expect downstream evaluations from independent labs measuring real-world reliability against closed models, plus Ollama and llama.cpp builds in the days after launch.

Poolside Releases Laguna M.1 and XS.2: Open-Weight Coding Models Hitting 72.5% on SWE-bench Verified

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Poolside Releases Laguna M.1 and XS.2: Open-Weight Coding Models Hitting 72.5% on SWE-bench Verified

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

MLCommons Publishes MLPerf Inference v6.0 Results for Enterprise AI Systems

GLM 5.1 Rivals Claude Opus 4.6 on Agentic Tasks at One-Third the Cost

BullshitBench Results Show Anthropic Claude Models Dominate Top Seven Spots in Nonsense Detection Rankings