- Amazon employees are deliberately automating unnecessary tasks to inflate token consumption on internal AI leaderboards, per Financial Times reporting cited by The Decoder.
- Amazon’s internal tool “MeshClaw” lets employees create AI agents that trigger code deployments, triage emails, or interact with apps like Slack.
- Amazon set a target for over 80% of developers to use AI weekly and began tracking token consumption on internal leaderboards earlier in 2026.
- Meta employees have engaged in similar “tokenmaxxing” behaviour, per The Decoder.
What Happened
Amazon employees are gaming the company’s internal AI leaderboards by automating unnecessary tasks to artificially inflate their token consumption, the Financial Times reported in coverage aggregated by The Decoder on Tuesday. The behaviour, dubbed “tokenmaxxing,” is enabled by Amazon’s in-house tool MeshClaw, which lets staff create AI agents capable of triggering code deployments, triaging emails, and interacting with applications such as Slack.
Why It Matters
The pattern illustrates a Goodhart’s Law failure in enterprise AI rollouts: when a measure becomes a target, it ceases to be a good measure. Amazon set a goal for more than 80% of developers to use AI tools weekly and earlier in 2026 began tracking token consumption on internal leaderboards. Token volume, intended as a usage proxy, is now being deliberately inflated. Goodhart effects of this kind have been documented previously in software-engineering metrics — lines of code, commit count, story-point velocity — and surface predictably whenever individual contributors are tracked on raw activity metrics.
Technical Details
MeshClaw enables programmatic agent creation across Amazon’s internal toolchain. The same agent capabilities that let employees automate genuine work — code deployment, email triage, Slack interaction — also let them script repeated activity to consume tokens. Amazon has officially stated that the leaderboard data does not factor into performance reviews. But one Amazon employee, quoted by the Financial Times, said: “There is just so much pressure to use these tools. Some people are just using MeshClaw to maximise their token usage.” Another employee disagreed with the official line: “Managers are looking at it. When they track usage it creates perverse incentives and some people are very competitive about it.”
Who’s Affected
Amazon developers are the direct subjects of the leaderboard system. Other large employers tracking similar metrics — including Meta, per The Decoder — face the same incentive structure. AI productivity-tool vendors, including GitHub, Cursor, Codeium, and Anthropic’s Claude Code, see one of their core sales metrics — token or seat consumption — exposed as gameable when used internally as a productivity proxy. CFOs and chief technology officers running large internal AI rollouts will read the case as evidence that token consumption is not a substitute for outcome-based metrics like PR-merge rate, ticket close-time, or revenue per developer.
What’s Next
Amazon has not announced changes to the leaderboard. The broader industry trend toward AI-productivity measurement is likely to push toward outcome-linked metrics — code-quality scores, test coverage, defect rates, post-deployment incident frequency — rather than raw token counts. Industry research from McKinsey, Gartner, and academic groups at Stanford and MIT will likely cite the Amazon case in 2026 reports on AI productivity measurement. For now, MeshClaw remains in use and tokenmaxxing remains, per the Financial Times’ sources, a documented pattern.