Sub-Second Video Search with Gemini's Embedding Tool

SentrySearch, an open-source project published by GitHub user ssrajadh — full name not available at time of publication — enables natural-language search across MP4 video archives using Google’s Gemini Embedding 2 API or the locally-runnable Qwen3-VL model. The tool converts video segments and text queries into the same vector space, enabling retrieval without transcription or frame-level captioning. As of early April 2026, the repository has 2,400 stars and 196 forks on GitHub.

SentrySearch supports two embedding backends: Google’s cloud-based Gemini Embedding 2 API and the locally-runnable Qwen3-VL model, offering both cloud and offline operation.
Videos are split into overlapping chunks — default 30-second segments with 5-second overlap — and each chunk is embedded as raw video into a 768-dimensional vector stored in a local ChromaDB instance.
Text queries are mapped into the same 768-dimensional space and matched by cosine similarity, bypassing transcription or captioning entirely.
An optional preprocessing step downscales footage to 480p at 5 fps before embedding to reduce API token costs; disabling it sends full-resolution chunks to the model at higher cost.

What Happened

GitHub user ssrajadh published SentrySearch, a Python CLI tool that indexes video archives into a local vector database and retrieves matching clips in response to plain-language queries. The project’s README describes it as “semantic search over video footage: type what you’re looking for, get a trimmed clip back.” The repository includes 96 commits and a demo — titled “OpenClaw Skill demo” — showing the tool locating a specific motion sequence from recorded footage using a text description.

The project supports two embedding backends: Google’s Gemini Embedding 2 API for cloud-based indexing and Qwen3-VL for fully local operation. The dual-backend design distinguishes it from tools that depend exclusively on a single commercial API.

Why It Matters

Conventional video search systems typically depend on speech-to-text transcription, frame-level image captioning, or manual annotation — each requiring substantial preprocessing before any retrieval is possible. Gemini Embedding 2, introduced by Google in early 2025, was among the first commercially available APIs to embed raw video segments and text strings into a shared vector space. SentrySearch operationalizes that capability as a standalone, installable tool.

The availability of a local Qwen3-VL path is significant for deployments with data-residency or privacy requirements. Footage does not need to leave the operator’s infrastructure when using the local model, which removes a meaningful barrier for security-camera and enterprise use cases.

Technical Details

SentrySearch processes video by splitting MP4 files into overlapping chunks via ffmpeg. The default configuration produces 30-second segments with 5-second overlaps between consecutive chunks. Each segment is passed to the selected embedding model — Gemini Embedding 2 via the Google API, or Qwen3-VL running locally — and the resulting 768-dimensional floating-point vector is written to a local ChromaDB database.

At query time, the input text string is embedded into the same 768-dimensional space and matched against stored video vectors using cosine similarity. The developer’s documentation states directly: “A text query like ‘red truck at a stop sign’ is directly comparable to a 30-second video clip at the vector level.” The highest-scoring segment is extracted from the original file using ffmpeg and saved as a standalone clip.

An optional preprocessing pass downscales video to 480p at 5 fps before embedding. The documentation notes that disabling this step — sending full-resolution chunks to the API — increases both processing time and Gemini API token costs. Search output includes per-result cosine similarity scores alongside the extracted clip file path.

Who’s Affected

The tool is directly applicable to developers building video retrieval pipelines, security camera operators managing multi-hour footage archives, and dashcam users who need to locate specific incidents without manual scrubbing. The Gemini API path requires a Google Cloud account and incurs per-request charges tied to video token volume; the Qwen3-VL path eliminates per-query API cost but requires sufficient local compute to run the model.

Developers can invoke the CLI to batch-index entire footage directories and return scored, clip-extracted results suitable for downstream review or automated alerting systems. Installation is managed via uv, and the project includes a .env.example for API key configuration.

What’s Next

The repository has three open pull requests and one open issue as of early April 2026, indicating active community engagement. No formal development roadmap has been published. The project documentation does not include benchmark comparisons against transcript-based or captioning-based retrieval methods, so relative accuracy across approaches has not been formally established in the published materials.

Cost at scale remains an open practical question: Gemini Embedding 2 charges per token of video content, and high-resolution or high-framerate archives processed without the 480p preprocessing step could generate significant API spend. Operators evaluating the tool at scale will likely need to compare the Qwen3-VL local path against the API path for total cost of ownership.

SentrySearch: Open-Source Tool for Sub-Second Semantic Video Search via Gemini Embedding 2

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

SentrySearch: Open-Source Tool for Sub-Second Semantic Video Search via Gemini Embedding 2

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

Anthropic Splits Claude Subscriptions: Programmatic Use Gets Separate Budget at Full API Rates

WhatsApp Adds Incognito Mode for Meta AI Chats, Powered by Muse Spark Model

Google Gemini App Redesign Surfaces in iOS Limited Rollout, Heavily Leverages Liquid Glass