Christopher Meiklejohn has documented a working system where Anthropic’s Claude performs automated quality assurance on his mobile application Zabriskie, driving both Android and iOS simulators to take screenshots, identify visual regressions, and file bug reports. The approach, detailed in a March 22 blog post, addresses a gap that many small development teams face: maintaining QA coverage across mobile platforms when resources only stretch to automated testing on web.
Zabriskie’s web version already has over 150 end-to-end tests running through Playwright. The mobile versions — built as native apps for Android and iOS — had no equivalent automated testing. Manual QA across two platforms is time-consuming and inconsistent, making it a natural candidate for AI-assisted automation. Meiklejohn’s solution uses Claude to interact with mobile simulators, capture screenshots at each step, analyze the visual output for layout issues, missing elements, or rendering errors, and generate structured bug reports.
The setup process revealed significant differences between platforms. Android took approximately 90 minutes to configure, reflecting the relative simplicity of Android’s emulator tooling and debug bridge. iOS required over six hours, due to Xcode’s more complex simulator management, code signing requirements, and the additional tooling needed to enable programmatic interaction with iOS apps in a test environment.
Once configured, Claude now sweeps 25 screens daily on both platforms — a coverage level that would require a dedicated QA engineer working several hours per day if done manually. The system captures visual state, compares it against expected layouts, and flags anomalies that a human developer then reviews. This approach trades the precision of pixel-perfect visual regression testing for the flexibility of an AI that can identify issues a rule-based system would miss, such as text truncation, misaligned elements, or interaction states that look wrong but pass strict layout checks.
The implementation is notable as a practical example of AI agents performing useful work in software development outside of code generation. While AI coding assistants have received the most attention, QA — particularly visual QA across multiple platforms — represents a category where AI’s ability to interpret screenshots and reason about expected behavior offers genuine productivity gains. The 150-test Playwright suite handles the web; Claude handles mobile. Between them, a solo developer maintains QA coverage that would traditionally require a small team.
