Chroma has released Context-1, a 20 billion parameter agentic search model designed to act as a specialized retrieval subagent for complex, multi-hop queries. Rather than expanding context windows in frontier models, the company behind the popular open-source vector database is taking what it calls “a different, more surgical approach” to retrieval-augmented generation systems.
Context-1 is derived from gpt-oss-20B, a Mixture of Experts (MoE) architecture that Chroma fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning via CISPO (a staged curriculum optimization). The model operates within an agent harness that allows it to interact with tools including search_corpus (hybrid BM25 + dense search), grep_corpus (regex), and read_document.
The model’s core technical innovation is “Self-Editing Context,” which addresses context window degradation during multi-step searches. As Context-1 gathers information over multiple turns, it reviews its accumulated context and executes a prune_chunks command to discard irrelevant passages, achieving a pruning accuracy of 0.94. The model averages 2.56 tool calls per turn and maintains retrieval quality within a bounded 32k context window.
According to the source material, “Context-1 doesn’t just hit a vector index once. It decomposes the high-level query into targeted subqueries, executes parallel tool calls, and iteratively searches the corpus.” This approach shifts retrieval logic responsibility from developers to the model itself, representing what Chroma describes as “decoupling search from generation.”
Chroma developed a benchmark called “context-1-data-gen” to train and evaluate the model on multi-hop reasoning tasks that require multiple steps to reach ground truth answers. The company positions this as addressing the “lost in the middle” reasoning failures that occur when large amounts of tokens are inserted into prompts, leading to higher latency and increased costs.
