Here’s the thing about measuring AI adoption: The moment you make a metric a target, it stops being useful.
Amazon learned this the hard way. Employees at the company recently admitted to inflating their AI token consumption—a practice now called “tokenmaxxing”—to climb internal leaderboards tracking how much AI they use. Some workers were running Amazon’s in-house AI tool, MeshClaw, to automate meaningless or redundant tasks, not because it helped them work better, but because it drove up their usage numbers. Amazon had set a KPI requiring more than 80% of developers to use AI weekly, and per reporting from the Financial Times, managers were watching. Closely.
The company officially said the stats wouldn’t factor into performance reviews. Unofficially, one employee put it plainly: “When they track usage it creates perverse incentives and some people are very competitive about it.”
Amazon isn’t alone. Meta employees built an internal tool called “Claudeonomics” that ranked the company’s roughly 85,000 workers by token consumption. In a single 30-day window, usage on that dashboard exceeded 60 trillion tokens. The dashboard was taken down after it became public. The underlying pressure? Still very much there.
This isn’t a technology failure. It’s a management failure. And if you’re leading a team—or building a business—in 2026, it’s almost certainly happening in your organization too.
Why Good Leaders Build Bad AI Metrics
The reason tokenmaxxing exists has a name economists coined decades ago: Goodhart’s law. British economist Charles Goodhart articulated it in 1975, and it’s never been more relevant. When a measure becomes a target, it ceases to be a good measure.
You’ve seen this play out before. Sales teams that hit call volume targets by rushing customers off the phone. Customer service reps who close tickets fast but solve nothing. Developers who hit code-commit quotas by breaking tasks into fragments. The mechanism is always the same: People optimize for what is measured, not for what actually matters.
AI has just created an expensive new version of this problem. Leaders are under enormous pressure to show AI adoption—from boards, from competitors, from shareholders watching hyperscalers pour hundreds of billions into infrastructure. The easiest way to prove adoption? Count usage. Token consumption, logins, weekly active users. These numbers are easy to track and easy to present on a slide. They are also almost completely disconnected from whether AI is doing anything useful.
Grant Thornton’s 2026 AI Impact Survey of 950 C-suite and senior business leaders found that organizations with fully integrated AI were nearly four times more likely to report AI-driven revenue growth than those still in the piloting phase—58% versus 15%. The difference wasn’t who had the most usage. It was who had connected AI to real outcomes. Yet 78% of those same executives said they couldn’t pass an independent AI governance audit. The gap between activity and accountability is enormous.
Here’s what that means for you: If your AI strategy is built around getting people to use tools, rather than getting people to produce better outcomes, you’ve already set Goodhart’s trap.
The Real Question to Ask About AI
Before you decide what to measure, you need to decide what you’re actually trying to change. This sounds obvious. Most leaders skip it entirely.
Token consumption measures how much your team is prompting an AI. It tells you nothing about whether those prompts are making work faster, better or more profitable. It’s the equivalent of tracking how often employees open their laptops.
The right question isn’t, “How much AI are we using?” It’s, “What is changing because we’re using AI?” The Human Resources Director reported that according to EY’s Global Consulting AI lead, the companies performing best on AI adoption are those in which employees understand why the technology is being deployed and what it’s expected to produce: “When leaders are transparent, employees lean in and performance follows.”
That transparency gap is where most AI rollouts collapse. People see a leaderboard. They feel the pressure—especially in an environment where the share of managers who think replacing employees with AI is a good idea has risen from 23% in 2025 to 35% in 2026. When survival feels like it depends on your token count, you find ways to inflate your token count.
How to Measure AI Adoption That Actually Works
The fix isn’t to stop measuring. It’s to measure the right things. Here’s a framework you can implement this week.
Start with output quality, not output volume. Pick one workflow where you’ve introduced AI—email drafting, client research, proposal writing, code review—and measure the downstream outcome. Are proposals winning more? Is research taking half the time? Are fewer errors making it to the next stage? Volume metrics tell you how busy your team looks. Quality metrics tell you whether AI is doing anything useful.
Connect AI usage to a specific business goal, not a general adoption target. “80% of developers should use AI weekly” is a behavior target. “Reduce average code review time from three days to one” is a business target. The second version gives your team a reason to use AI that doesn’t require gaming anything. They’ll adopt it because it helps them hit the number, not because someone is watching their dashboard.
Create a feedback loop, not a leaderboard. Leaderboards create competition around metrics. Feedback loops create learning around outcomes. The difference: A leaderboard tells your team who used the most AI last month. A feedback loop asks which AI-assisted workflows actually moved the needle and which ones your team found useless, confusing or more trouble than they were worth. That second conversation is where your real adoption strategy gets built.
What This Means If You’re Not Amazon
You might be reading this and thinking: I don’t have 80,000 employees. I run a team of 12 or a consultancy of one.
The principle still applies. If you’re measuring your own AI use—tracking how many prompts you run, how many tools you pay for, how often you open your AI assistant—and you haven’t attached those behaviors to something that changed in your results, you’re tokenmaxxing too. Just voluntarily.
The question worth sitting with this week isn’t how much AI you’re using. It’s what’s different because you’re using it. Where did a deliverable get faster? Where did a decision get better? Where did a client get a better answer than they would have gotten otherwise?
That’s your AI adoption metric. Everything else is noise.
Featured image from Andrey_Popov/Shutterstock







