TOOL UPDATES

Developer Builds Sub-Second Video Search Using Gemini’s Native Video Embedding

M megaone_admin Mar 24, 2026 2 min read
Engine Score 8/10 — Important

This story highlights Gemini's new native video embedding capability, enabling a novel sub-second video search tool. Its high actionability for developers and significant technical novelty make it an important update for those working with multimodal AI.

Editorial illustration for: Developer Builds Sub-Second Video Search Using Gemini's Native Video Embedding

A developer has created SentrySearch, an open-source tool that enables semantic search across dashcam footage using Google’s Gemini Embedding 2 model’s native video processing capabilities. The system can locate specific scenes in hours of video footage in under a second by directly embedding video content without transcription or frame captioning.

Developer ssrajadh built the tool to demonstrate Gemini Embedding 2’s ability to process raw video pixels directly into the same 768-dimensional vector space used for text queries. “A text query like ‘red truck at a stop sign’ is directly comparable to a 30-second video clip at the vector level,” the developer explains in the project documentation.

SentrySearch works by splitting videos into overlapping chunks, embedding each chunk as raw video using Gemini’s model, and storing the resulting vectors in a local ChromaDB database. When users search with natural language queries like “red truck running a stop sign,” the system embeds the text query and matches it against stored video embeddings. The tool automatically trims and saves the best matching clip from the original footage.

The system includes preprocessing options to optimize performance, including configurable chunk duration (default 30 seconds), overlap settings (default 5 seconds), and video downscaling to 480p at 5 fps. Users can disable preprocessing to send raw video chunks directly to the embedding model, though this may impact processing speed and API costs.

The project has gained attention on Hacker News with 233 GitHub stars since its release. The tool requires a Gemini API key and ffmpeg for video processing, with the developer providing a command-line interface for indexing footage directories and performing searches. Search results include similarity scores and automatic clip extraction, making it practical for reviewing specific incidents in large video archives.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy