- Oppo’s Multi-X team released X-OmniClaw, an open-source AI agent for Android that operates apps using the phone’s camera, screen, and voice — all directly on the device.
- X-OmniClaw runs on the physical phone, not in a virtualised cloud-phone instance, distinguishing it from RedFinger, Alibaba Wuying, and Tencent Cloud Phone.
- It processes gallery photos locally into a searchable text-based memory and learns by cloning user behaviour.
- A cloud LLM is called in only as “fuel” for higher-level reasoning, per Oppo’s technical report.
What Happened
Oppo’s Multi-X team has released X-OmniClaw, an open-source AI agent for Android that taps into the phone’s camera, screen, and voice to operate real apps — all on the physical device. The agent’s perception, control, and app-interaction logic live on the phone; a cloud language model is only invoked when higher-level reasoning is needed.
Why It Matters
X-OmniClaw represents a structural departure from cloud-phone agent platforms that have emerged through 2025. RedFinger, Alibaba’s Wuying, and Tencent Cloud Phone run AI agents inside virtualised Android instances in remote data centres. That architecture cannot touch local sensors, cameras, or private data — a fundamental constraint when the agent’s task depends on what the user is currently seeing or holding.
By running on-device, X-OmniClaw can act on private data without exfiltrating it. The trade-off is compute: on-device models are smaller and slower than data-centre frontier models. Oppo’s compromise — a local model for perception and grounding, a cloud LLM for hard reasoning — mirrors the architectural pattern Apple Intelligence and Google’s Gemini Nano have pursued.
Technical Details
The agent bundles three perception channels into one pipeline. A vision-language model first interprets the scene along with the user’s request before any action is triggered. In Oppo’s reference example, a user pointing the phone camera at a product and asking “How much does this cost on Taobao?” gets internally rephrased to “price of Evian spray on Taobao,” with the structured intent then handed off for execution.
For long-term memory, X-OmniClaw condenses local data into semantic entries. During idle time, gallery photos are processed into compact descriptions of objects, scenes, and events, stored as a Markdown file. Every entry runs through a privacy filter to strip sensitive information before persisting. Oppo’s technical report lists components including an on-device grounding model and OCR for detecting tappable UI elements; the specific local models are not named.
Behaviour cloning is built in: X-OmniClaw can observe user actions and replicate them autonomously in subsequent invocations. In Oppo’s demos, the agent compared product prices captured on camera, acted as a floating assistant to solve exercises, and independently created themed photo albums from a user’s gallery.
Who’s Affected
Open-source Android-agent developers gain a reference implementation that ships full on-device perception, grounding, and execution rather than wrapping a cloud LLM. Oppo, the world’s third-largest smartphone vendor by shipments, gains a public-research credit that strengthens its agentic-AI positioning relative to Samsung’s Bixby and Xiaomi’s HyperOS AI. Cloud-phone agent providers — RedFinger, Wuying, Tencent — face a credible architectural alternative that may divide the market: cloud for compute-heavy multi-tenant workloads, on-device for privacy-sensitive personal automation.
What’s Next
The Multi-X team has not announced a productisation timeline; X-OmniClaw is released as open source for community evaluation and contribution. Expect industry engineers at Google, Apple, Samsung, and Xiaomi to benchmark X-OmniClaw against their own first-party on-device-agent stacks. Oppo’s broader strategy of competing on AI capabilities at the OEM layer continues to evolve; future releases from the Multi-X team are anticipated through the rest of 2026.