TOOL UPDATES

SentrySearch: Open-Source Tool for Sub-Second Semantic Video Search via Gemini Embedding 2

R Ryan Matsuda Mar 24, 2026 Updated Apr 7, 2026 4 min read
Engine Score 8/10 — Important

This story highlights Gemini's new native video embedding capability, enabling a novel sub-second video search tool. Its high actionability for developers and significant technical novelty make it an important update for those working with multimodal AI.

Editorial illustration for: Developer Builds Sub-Second Video Search Using Gemini's Native Video Embedding

SentrySearch, an open-source project published by GitHub user ssrajadh — full name not available at time of publication — enables natural-language search across MP4 video archives using Google’s Gemini Embedding 2 API or the locally-runnable Qwen3-VL model. The tool converts video segments and text queries into the same vector space, enabling retrieval without transcription or frame-level captioning. As of early April 2026, the repository has 2,400 stars and 196 forks on GitHub.

  • SentrySearch supports two embedding backends: Google’s cloud-based Gemini Embedding 2 API and the locally-runnable Qwen3-VL model, offering both cloud and offline operation.
  • Videos are split into overlapping chunks — default 30-second segments with 5-second overlap — and each chunk is embedded as raw video into a 768-dimensional vector stored in a local ChromaDB instance.
  • Text queries are mapped into the same 768-dimensional space and matched by cosine similarity, bypassing transcription or captioning entirely.
  • An optional preprocessing step downscales footage to 480p at 5 fps before embedding to reduce API token costs; disabling it sends full-resolution chunks to the model at higher cost.

What Happened

GitHub user ssrajadh published SentrySearch, a Python CLI tool that indexes video archives into a local vector database and retrieves matching clips in response to plain-language queries. The project’s README describes it as “semantic search over video footage: type what you’re looking for, get a trimmed clip back.” The repository includes 96 commits and a demo — titled “OpenClaw Skill demo” — showing the tool locating a specific motion sequence from recorded footage using a text description.

The project supports two embedding backends: Google’s Gemini Embedding 2 API for cloud-based indexing and Qwen3-VL for fully local operation. The dual-backend design distinguishes it from tools that depend exclusively on a single commercial API.

Why It Matters

Conventional video search systems typically depend on speech-to-text transcription, frame-level image captioning, or manual annotation — each requiring substantial preprocessing before any retrieval is possible. Gemini Embedding 2, introduced by Google in early 2025, was among the first commercially available APIs to embed raw video segments and text strings into a shared vector space. SentrySearch operationalizes that capability as a standalone, installable tool.

The availability of a local Qwen3-VL path is significant for deployments with data-residency or privacy requirements. Footage does not need to leave the operator’s infrastructure when using the local model, which removes a meaningful barrier for security-camera and enterprise use cases.

Technical Details

SentrySearch processes video by splitting MP4 files into overlapping chunks via ffmpeg. The default configuration produces 30-second segments with 5-second overlaps between consecutive chunks. Each segment is passed to the selected embedding model — Gemini Embedding 2 via the Google API, or Qwen3-VL running locally — and the resulting 768-dimensional floating-point vector is written to a local ChromaDB database.

At query time, the input text string is embedded into the same 768-dimensional space and matched against stored video vectors using cosine similarity. The developer’s documentation states directly: “A text query like ‘red truck at a stop sign’ is directly comparable to a 30-second video clip at the vector level.” The highest-scoring segment is extracted from the original file using ffmpeg and saved as a standalone clip.

An optional preprocessing pass downscales video to 480p at 5 fps before embedding. The documentation notes that disabling this step — sending full-resolution chunks to the API — increases both processing time and Gemini API token costs. Search output includes per-result cosine similarity scores alongside the extracted clip file path.

Who’s Affected

The tool is directly applicable to developers building video retrieval pipelines, security camera operators managing multi-hour footage archives, and dashcam users who need to locate specific incidents without manual scrubbing. The Gemini API path requires a Google Cloud account and incurs per-request charges tied to video token volume; the Qwen3-VL path eliminates per-query API cost but requires sufficient local compute to run the model.

Developers can invoke the CLI to batch-index entire footage directories and return scored, clip-extracted results suitable for downstream review or automated alerting systems. Installation is managed via uv, and the project includes a .env.example for API key configuration.

What’s Next

The repository has three open pull requests and one open issue as of early April 2026, indicating active community engagement. No formal development roadmap has been published. The project documentation does not include benchmark comparisons against transcript-based or captioning-based retrieval methods, so relative accuracy across approaches has not been formally established in the published materials.

Cost at scale remains an open practical question: Gemini Embedding 2 charges per token of video content, and high-resolution or high-framerate archives processed without the 480p preprocessing step could generate significant API spend. Operators evaluating the tool at scale will likely need to compare the Qwen3-VL local path against the API path for total cost of ownership.

Related Reading

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime