Liquid AI Runs 24-Billion-Parameter Model at 50 Tokens Per Second in a Web Browser

Liquid AI has demonstrated its LFM2-24B-A2B model running at approximately 50 tokens per second inside a web browser using WebGPU on an Apple M4 Max chip. The model has 24 billion total parameters but only 2.3 billion are active per token, thanks to a Mixture-of-Experts architecture that routes each input to a small subset of the model’s capacity. A smaller variant, LFM2-8B-A1B, achieved over 100 tokens per second on the same hardware.

The model fits within 32 gigabytes of RAM, making it compatible with current-generation consumer laptops and desktops. On dedicated hardware, performance scales further: 112 tokens per second on AMD CPUs, 293 tokens per second on NVIDIA H100 GPUs, and 35.4 tokens per second on Qualcomm’s Snapdragon 8 Elite mobile processor. Liquid AI released an early checkpoint of LFM2-24B-A2B as open-weight on Hugging Face on February 24, 2026.

Running a capable AI model in a browser tab without any server connection has immediate practical implications. Sensitive data never leaves the device, eliminating privacy concerns that prevent many organizations — particularly in healthcare, legal, and financial sectors — from using cloud AI services. There is no network latency, enabling near-instant responses for real-time applications. And the system works offline, making AI features available without internet connectivity.

Liquid AI, an MIT spin-off founded in 2023, has raised $297 million and achieved a $2 billion valuation after a $250 million Series A led by AMD in December 2024. The company’s approach challenges the assumption that useful AI requires cloud infrastructure. If a 24-billion-parameter model can run at conversational speed in a browser, the threshold for which tasks require cloud AI shifts significantly.

The silicon ecosystem is already adapting. Intel is optimizing for Liquid AI models through OpenVINO. AMD is integrating support through its Ryzen AI platform. Qualcomm is targeting AI PCs and high-end mobile devices. Inference partners including Ollama, LM Studio, and Nexa AI are building deployment tools across mobile, desktop, and terminal environments.

The browser as an AI runtime is a particularly interesting development because it eliminates the installation barrier entirely. A web developer can build an AI application that runs entirely client-side, with no backend infrastructure, no API costs, and no data transmission. The model downloads once and runs locally. For the subset of AI use cases where privacy, latency, and cost matter more than maximum capability, this architecture may prove transformative.

Liquid AI Runs 24-Billion-Parameter Model at 50 Tokens Per Second in a Web Browser

Enjoyed this story?

BullshitBench Results Show Anthropic Claude Models Dominate Top Seven Spots in Nonsense Detection Rankings

Function Calling Harness Pushes Qwen From 6.75 Percent to 100 Percent Success on Complex Schemas

Open-Source ATLAS System on a $500 GPU Outperforms Claude Sonnet on Coding Benchmarks

Before you go…