A developer has created SentrySearch, an open-source tool that enables semantic search across dashcam footage using Google’s Gemini Embedding 2 model’s native video processing capabilities. The system can locate specific scenes in hours of video footage in under a second by directly embedding video content without transcription or frame captioning.
Developer ssrajadh built the tool to demonstrate Gemini Embedding 2’s ability to process raw video pixels directly into the same 768-dimensional vector space used for text queries. “A text query like ‘red truck at a stop sign’ is directly comparable to a 30-second video clip at the vector level,” the developer explains in the project documentation.
SentrySearch works by splitting videos into overlapping chunks, embedding each chunk as raw video using Gemini’s model, and storing the resulting vectors in a local ChromaDB database. When users search with natural language queries like “red truck running a stop sign,” the system embeds the text query and matches it against stored video embeddings. The tool automatically trims and saves the best matching clip from the original footage.
The system includes preprocessing options to optimize performance, including configurable chunk duration (default 30 seconds), overlap settings (default 5 seconds), and video downscaling to 480p at 5 fps. Users can disable preprocessing to send raw video chunks directly to the embedding model, though this may impact processing speed and API costs.
The project has gained attention on Hacker News with 233 GitHub stars since its release. The tool requires a Gemini API key and ffmpeg for video processing, with the developer providing a command-line interface for indexing footage directories and performing searches. Search results include similarity scores and automatic clip extraction, making it practical for reviewing specific incidents in large video archives.
