TOOL UPDATES

KittenML’s KittenTTS v0.8 Delivers Three ONNX Models for CPU Inference

R Ryan Matsuda Mar 20, 2026 Updated Apr 7, 2026 3 min read
Engine Score 8/10 — Important

This story introduces new, highly compact TTS models, offering significant novelty and immediate actionability for developers. Its direct link to the GitHub repository ensures high source reliability for those seeking to implement it.

KittenML has released Kitten TTS v0.8, an open-source text-to-speech library built on ONNX that runs on standard CPUs without requiring a GPU. The release introduces three model variants spanning 15 million to 80 million parameters, with on-disk sizes between 25MB and 80MB. The repository has reached 13.4k GitHub stars and 739 forks since its creation.

  • Three model tiers: kitten-tts-nano (15M params, 25MB int8-quantized), kitten-tts-micro (40M params, 41MB), kitten-tts-mini (80M params, 80MB)
  • All models output 24 kHz audio and ship with eight built-in named voices
  • ONNX-based inference supports CPU deployment; optional GPU acceleration is also available
  • Released as a developer preview under Apache 2.0; APIs are subject to change between versions

What Happened

KittenML published Kitten TTS v0.8 to GitHub, presenting three ONNX-backed TTS models designed to deliver, in the words of the project README, “high-quality voice synthesis on CPU without requiring a GPU.” The release is distributed under the Apache 2.0 license, installable via pip from GitHub releases, with model weights hosted on Hugging Face Hub. Individual author names were not listed in the repository at time of publication.

Why It Matters

GPU-dependent TTS systems remain the dominant approach for high-quality synthesis, but they impose hardware requirements that exclude edge devices, offline environments, and low-cost compute instances. CPU-compatible models under 100MB reduce these barriers for embedded hardware, mobile applications, and locally-run assistants where GPU access is unavailable or impractical.

The ONNX Runtime — the inference framework underpinning KittenTTS — is a cross-platform, open-source engine maintained by Microsoft and broadly supported across operating systems and hardware architectures. This foundation lowers distribution friction across heterogeneous environments. Prior lightweight TTS projects such as Piper TTS have demonstrated sustained demand for this class of model, but few have publicly offered multiple quantization tiers under a unified Python library interface.

Technical Details

The three model variants differ in parameter count, disk footprint, and the implicit tradeoff between inference speed and output quality. Kitten-tts-mini, the largest, contains 80M parameters at 80MB on disk. Kitten-tts-micro sits at 40M parameters and 41MB. Kitten-tts-nano is the smallest at 15M parameters, available in a standard 56MB build or an int8-quantized version at 25MB — the latter matching the repository’s stated goal of a “state-of-the-art TTS model under 25MB.” All three models produce audio at 24 kHz.

The library ships with streaming inference support, demonstrated via an included example_streaming.py script, and optional CUDA acceleration for GPU environments via a separate requirements_gpu.txt. This makes CPU-only mode a deployment choice rather than a hard constraint. Built-in text preprocessing normalizes numbers, currencies, and measurement units before synthesis. KittenML notes in the repository that “some users have reported issues with the kitten-tts-nano-0.8-int8 model” and directs affected users to the GitHub issue tracker.

Eight built-in voices are included — Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo — with speech speed adjustable through the API.

Who’s Affected

Developers building applications that require offline or resource-constrained voice synthesis — including accessibility tools, embedded assistants, and local-first productivity software — are the direct audience for this release. The Apache 2.0 license permits commercial use without restriction, though KittenML separately offers paid tiers covering integration assistance, custom voice development, and enterprise licensing; specific pricing is not disclosed in the repository.

The 73 open issues and 27 active pull requests signal an actively maintained but early-stage project. Developers building production systems should weigh the developer preview status before committing to the current API surface.

What’s Next

KittenML has not published a formal roadmap or public benchmarks comparing KittenTTS against other lightweight TTS systems at equivalent model sizes and parameter counts. The developer preview designation means the current API should not be treated as stable for production deployments. A stable release with reproducible evaluation data would help developers assess how these models compare to alternatives such as Piper TTS, Kokoro, and Coqui XTTS-v2 at similar scales.

Related Reading

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime