AI Silent Speech: Neck Muscle Light Sensor Restores Voice

A bioengineering research team published findings in April 2026 demonstrating a wearable, non-surgical device that reads neck muscle contractions through near-infrared light and uses an artificial intelligence model to reconstruct the intended words as audible, synthesized speech. The system targets the estimated 10 million people worldwide living with voice loss from laryngectomy, amyotrophic lateral sclerosis, or vocal cord damage — requiring no surgery, no implanted electrodes, and no sound production from the user.

The engineering core is deceptively simple: adapt an existing optical sensor technology for a new biological signal — skeletal muscle movement rather than blood flow — and train an AI to translate what it detects into phonemes.

How Near-Infrared Light Detects AI Silent Speech in the Neck

Standard photoplethysmography (PPG) sensors work by projecting near-infrared light into tissue and measuring how much returns. The same principle behind every pulse oximeter on every hospital finger clip. Oxygenated and deoxygenated blood absorb different wavelengths; the sensor reads the difference. The physics also applies when muscle tissue mechanically compresses or stretches — light scattering geometry changes measurably, even without blood-flow variation.

The device positions an array of miniaturized PPG sensors along the anterior neck, targeting the sternocleidomastoid, omohyoid, and thyrohyoid muscle groups — the primary articulatory muscles. When a person silently mouths a word, these muscles contract in phoneme-specific patterns even though no air passes the vocal cords. The sensors capture these micro-deformations at millisecond resolution across multiple anatomical sites simultaneously, producing a spatiotemporal signal map that is distinct for each phoneme.

The practical advantage over electromyography (EMG) — the main alternative approach for silent speech capture — is significant. EMG requires skin-contact electrodes and conductive gel, picks up ambient electrical noise, and degrades when electrodes shift position. The optical approach tolerates thin fabric between sensor and skin, makes no electrical contact with the body, and produces signals more stable across head movement and perspiration. For a wearable designed for all-day use outside a clinical setting, that stability difference is not marginal.

The AI That Maps Muscle Contractions to Phonemes

Raw optical data from a neck sensor array is person-specific and noise-contaminated. Two people mouthing identical words produce different signal signatures based on individual neck geometry, muscle mass, and skin opacity. The AI component resolves this through personalized calibration training.

The team trained a transformer-based sequence model on paired recordings: participants mouthed words silently while the sensor array collected optical data, with ground-truth phoneme labels derived from simultaneous video lip-tracking and a throat-contact accelerometer. The model learns the spatiotemporal pattern in optical signals that corresponds to each phoneme for a specific user, then feeds those predictions into a language model decoder that applies statistical probability weighting based on English phoneme co-occurrence — filtering out physically implausible phoneme sequences in real time before speech synthesis.

Voice output runs through a personalization layer: users who recorded a voice sample before losing speech receive output in their own cloned voice. This connects directly to the broader AI voice generation market, where platforms are competing aggressively on voice cloning fidelity and output naturalness — a commercial competition that improves the medical application without requiring additional research investment in the voice synthesis component.

Three Populations, One Device

The device targets three distinct clinical populations with different pathology but the same functional gap: loss of the ability to produce intelligible audible speech.

Laryngectomy patients are the largest group. Approximately 150,000 laryngectomies are performed globally each year, with surgical removal of the larynx required primarily for laryngeal cancer. Current alternatives include the electrolarynx — a handheld vibrating device pressed against the throat that produces mechanically recognizable but robotically-sounding speech — and tracheoesophageal puncture, a surgically created trachea-esophagus connection that enables air-driven esophageal speech. Both require significant adaptation. Neither produces speech that passes as natural in social settings.

ALS patients face a progressive timeline. The ALS Association estimates roughly 30,000 Americans live with ALS at any given point, with speech deterioration typically beginning within one to two years of motor neuron loss in the bulbar region. Existing augmentative and alternative communication (AAC) devices depend on gaze tracking or residual finger movement — functional early in disease progression, increasingly unreliable as paralysis advances. The optical neck sensor, fitted before significant muscle atrophy occurs, could extend natural-feeling communication by months or years beyond what current tools support.

Vocal cord damage patients — from surgery, accident, or neurological paralysis — represent a third cohort that existing solutions serve poorly. Unlike ALS, this population typically retains intact neck musculature and general motor control, which may make optical sensing more accurate for this group and lowers the practical barrier to adoption.

Against Brain-Computer Interfaces: The Invasiveness Tradeoff

Brain-computer interface approaches to speech restoration have produced the field’s most headline-generating results. A 2023 paper published in Nature by University of California, San Francisco researchers demonstrated a cortical implant decoding intended speech at approximately 78 words per minute from a paralyzed patient — approaching conversational speed. Neuralink’s speech-restoration program targets comparable performance with next-generation electrode arrays.

As AI becomes increasingly embedded in human communication and cognition, BCI speech restoration sits at one extreme of the capability-invasiveness spectrum: maximum decoding performance, maximum procedural burden. Cortical implants require open-brain surgery under general anesthesia, carry infection risk during and after implantation, and involve electrode arrays that degrade over years as scar tissue encapsulates them. Implant cost runs $100,000–$200,000, and surgical candidacy requirements exclude many patients with advanced systemic disease.

The optical neck device occupies the opposite extreme: no surgery, no medical facility required beyond an initial fitting session, and component costs in the sub-$500 range at current prototype scale. The capability ceiling is lower — BCIs can potentially decode fully intended speech without any physical movement, while the optical system requires silent mouthing and therefore intact neck muscle function. For the large majority of voice-loss patients who retain neck muscle control, that tradeoff is entirely acceptable.

The question that actually determines clinical adoption isn’t whether one approach outperforms another in a laboratory. It’s whether the device outperforms the patient’s current alternative. Against an electrolarynx or a gaze-tracker AAC board, the neck sensor clears that bar in naturalness and speed for most candidates.

Accuracy Rates, Real-World Limits, and the Commercial Timeline

In controlled laboratory testing, the system demonstrated phoneme-level accuracy exceeding 85% for trained users within a constrained vocabulary. Word-level accuracy across a 200-word test set reached approximately 75% in early trials — comparable to first-generation EMG-based silent speech interfaces — with primary degradation noted for bilabial phonemes that rely more on lip position than throat muscle contraction.

Real-world deployment introduces conditions that laboratory environments suppress: head turning, swallowing, coughing, and external vibration all produce artifacts in optical muscle signals. The research team identified sensor placement consistency as a critical variable — small shifts in device position relative to target muscle groups produced measurable accuracy drops. This is an engineering problem, not a physics one.

The projected timeline to commercial availability is 24 to 36 months, contingent on U.S. FDA 510(k) clearance as a Class II medical device. That is a materially less burdensome regulatory pathway than the premarket approval required for cortical implants, with strong category precedent from existing AAC devices and prosthetics. European CE marking could come earlier if clinical trials run on parallel tracks.

The two engineering milestones between current prototype and commercial product: miniaturizing the sensor array from a research-grade module into a form factor compatible with a medical collar or shirt integration, and reducing the user calibration session from the current two-to-three hours to something patients with limited stamina can complete. Neither requires new science.

The Real Bottleneck Is the Sensor, Not the AI

The AI voice synthesis layer will improve faster than the sensing hardware. It is driven by competitive commercial pressure from a market that has nothing to do with medicine, and the trajectory is consistent: quality improving, cost falling, latency shrinking. MegaOne AI tracks 139+ AI tools across 17 categories, and voice generation is among the fastest-moving segments in the stack.

What this device validates is a non-invasive, low-cost hardware pathway for a problem that has previously required either invasive surgery or stigmatizing equipment. As AI systems push into new domains of autonomous sensing and discovery, the same architecture underpinning this approach — learned signal mapping from noisy physical inputs — applies across a growing range of biosensing problems beyond speech.

For the 10 million people currently living without a working voice, the remaining engineering problems are the kind that get solved through iteration and investment. The physics is validated. The AI architecture is established. A device that restores naturalistic synthesized speech from silent mouthing, without surgery, at sub-$500 component cost, changes what voice restoration access looks like — not incrementally, but structurally.

AI Reads Neck Muscles With Light to Restore Voice — No Surgery Required

How Near-Infrared Light Detects AI Silent Speech in the Neck

The AI That Maps Muscle Contractions to Phonemes

Three Populations, One Device

Against Brain-Computer Interfaces: The Invasiveness Tradeoff

Accuracy Rates, Real-World Limits, and the Commercial Timeline

The Real Bottleneck Is the Sensor, Not the AI

Enjoyed this story?

AI Reads Neck Muscles With Light to Restore Voice — No Surgery Required

How Near-Infrared Light Detects AI Silent Speech in the Neck

The AI That Maps Muscle Contractions to Phonemes

Three Populations, One Device

Against Brain-Computer Interfaces: The Invasiveness Tradeoff

Accuracy Rates, Real-World Limits, and the Commercial Timeline

The Real Bottleneck Is the Sensor, Not the AI

Enjoyed this story?

Claude Opus 4.7 vs GPT-5.4 Pro 2026: The Definitive Showdown [SWE-bench 87.6% vs 64%]

Brin Forms DeepMind Strike Team to Close Gemini’s Code Gap With Claude

Anysphere in Talks to Raise $2B for Cursor at $50B-Plus Valuation