BENCHMARKS

Liquid AI Runs 24-Billion-Parameter Model at 50 Tokens Per Second in a Web Browser

M megaone_admin Mar 26, 2026 2 min read
Engine Score 7/10 — Important

This story highlights a significant advancement in running large language models efficiently in a web browser using WebGPU, potentially impacting on-device AI development and privacy. However, the Reddit source limits its overall reliability and verification.

Editorial illustration for: Liquid AI Runs 24-Billion-Parameter Model at 50 Tokens Per Second in a Web Browser

Liquid AI has demonstrated its LFM2-24B-A2B model running at approximately 50 tokens per second inside a web browser using WebGPU on an Apple M4 Max chip. The model has 24 billion total parameters but only 2.3 billion are active per token, thanks to a Mixture-of-Experts architecture that routes each input to a small subset of the model’s capacity. A smaller variant, LFM2-8B-A1B, achieved over 100 tokens per second on the same hardware.

The model fits within 32 gigabytes of RAM, making it compatible with current-generation consumer laptops and desktops. On dedicated hardware, performance scales further: 112 tokens per second on AMD CPUs, 293 tokens per second on NVIDIA H100 GPUs, and 35.4 tokens per second on Qualcomm’s Snapdragon 8 Elite mobile processor. Liquid AI released an early checkpoint of LFM2-24B-A2B as open-weight on Hugging Face on February 24, 2026.

Running a capable AI model in a browser tab without any server connection has immediate practical implications. Sensitive data never leaves the device, eliminating privacy concerns that prevent many organizations — particularly in healthcare, legal, and financial sectors — from using cloud AI services. There is no network latency, enabling near-instant responses for real-time applications. And the system works offline, making AI features available without internet connectivity.

Liquid AI, an MIT spin-off founded in 2023, has raised $297 million and achieved a $2 billion valuation after a $250 million Series A led by AMD in December 2024. The company’s approach challenges the assumption that useful AI requires cloud infrastructure. If a 24-billion-parameter model can run at conversational speed in a browser, the threshold for which tasks require cloud AI shifts significantly.

The silicon ecosystem is already adapting. Intel is optimizing for Liquid AI models through OpenVINO. AMD is integrating support through its Ryzen AI platform. Qualcomm is targeting AI PCs and high-end mobile devices. Inference partners including Ollama, LM Studio, and Nexa AI are building deployment tools across mobile, desktop, and terminal environments.

The browser as an AI runtime is a particularly interesting development because it eliminates the installation barrier entirely. A web developer can build an AI application that runs entirely client-side, with no backend infrastructure, no API costs, and no data transmission. The model downloads once and runs locally. For the subset of AI use cases where privacy, latency, and cost matter more than maximum capability, this architecture may prove transformative.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy