#3 LeadDev's The Shift: Tokenmaxxing is the new lines of code
Plus: Agentic engineering management and bosses have AI brain
We don’t learn, do we? After years of patiently explaining that lines of code is a terrible way to assess developer productivity, some of the biggest technology companies in the world are measuring developer productivity according to how many AI tokens people are burning through.
The big story
Meta may have closed down its internal token usage leaderboard, but the damage was already done.
This trend – called tokenmaxxing, in a play on the internet slang to maxx, or max out – has taken the industry by storm in recent weeks. It reminds me a lot of when Elon Musk bought Twitter and started fixating on lines of code, in that it highlights just how lacking in sophistication our understanding of AI-driven productivity gains still are three years into this shift.
Like lines of code, measuring token usage is first and foremost a simple metric to track. Most AI tools have a dashboard built in, and it’s easy shorthand for usage, which is all executives seem to care about for now. Interestingly, venture capitalists seem to be very supportive of the practice 🤔.
The problem with token usage as a metric is a familiar one: it’s easy to game, tracks output not outcomes, and rewards waste over thoughtful usage.
I enjoyed Shwetank Kumar’s post on this topic, where he also highlighted the cyclical nature of this mistake:
In the 1990s, the metric was lines of code. Developers who wrote verbose, redundant code looked more productive than developers who solved the same problem in forty clean lines. And heaven forbid you try to improve a codebase by removing code. That made your numbers go down. Everyone knew it was a bad metric. But it was used anyway, because managers needed a number and lines of code were easy to count.
Story points in the 2010s had the same problem — teams inflated estimates, velocity went up, throughput didn’t.
[…]
Now the same pattern is playing out in AI. Kevin Roose reported in the New York Times that an OpenAI engineer logged 210 billion tokens in one week — the English Wikipedia thirty times over — and made the top of a company leaderboard for it. A single Anthropic employee spent $150,000 in one month on Claude Code. That’s more than most senior engineers earn in a year. Token budgets are now a line item in the benefits package, right alongside dental.
As Jellyfish head of research, Nicholas Arcolano, wrote, “while more tokens do correlate with more output, they come at a dramatically higher price point per unit.”
The finding is rooted in their own data, 12,000 developers across 200 companies in Q1 of this year, to be precise. They found that token usage is top heavy. “The typical user (50th percentile) consumes about 51 million tokens per month on AI coding. Meanwhile, the 90th percentile user consumes more than seven times that amount, at roughly 380 million tokens per month.”
When combining token usage data with merged pull requests (above), they found that power users are producing more output, but at a much higher cost.
“The median developer uses about 7 million tokens per PR, while developers in the top decile use roughly 69 million. That’s nearly ten times more tokens for about two times the throughput. Tokens, in this sense, behave less like a linear input and more like rocket fuel. Going faster is possible, but it requires exponentially more resources to do so.”
Why this matters
Lines of code never cost as much money as tokens. As companies look to incentivize employees to maximize their use of AI tools in pursuit of major productivity gains, they need a way to measure that impact.
Engineering managers are going to have to get a handle on token spending in the short term, as well as finding a better way to measure progress in the long term.
The former task will likely mean tighter restrictions on dev usage and better guardrails for runaway agents.
As Jellyfish’s Arcolano wrote, “The highest return doesn’t come from pushing a small group of developers to extreme levels of usage. Instead, it comes from getting more of the organization into the middle of the adoption curve, where usage is consistent, effective, and relatively efficient.”
Miklos Koren, professor of economics at the Central European University and founder of the MicroData research group, also predicts that things will shift “towards more decentralized LLMs and local source solutions” as token costs rise.
The latter task unearths a long-running industry debate over if it is even possible to measure engineering productivity in the first place. McKinsey famously kicked this hornet’s nest back in 2023.
What we know is any metric in isolation is useless, but we haven’t found the right, DORA-esque combination for AI-powered software development. Token usage should absolutely be part of that mix, especially as we see widespread irresponsible usage patterns, but the counterbalance metrics are less clear at this point.
Lauren Peate, founder and CEO of Multitudes, told me she has seen some engineers start to set their own personal token usage goals, “which isn’t the outcome anyone should be focusing on, since AI usage and token should just be a means to improving customer value delivered.”
That doesn’t make it a useless metric though. “Where token usage is useful is for thinking about how much AI adoption we’re seeing – but not for measuring whether AI usage is giving us the right outcomes. The reverse is more useful – maximum token usage isn’t a guarantee of a good thing, but if someone has no token usage, then that could be reason for concern – at a minimum, it’s a reason to ask questions about why someone isn’t using AI at all.”
A short conversation about tokenmaxxing with DORA lead, Nathen Harvey
When I knew tokenmaxxing would be the big story this week I reached out to Nathen Harvey, who leads Google Cloud’s DORA team, to see what he made of things. I enjoyed the conversation so much I am publishing a lightly edited version below.
SC: What do you think about tokenmaxxing?
Nathen: I am of a mixed opinion, especially leaderboards across a team and making that available. I think it can be helpful in the short term and I think it can be very, very unhealthy in the long term.
It is absolutely true that you can’t get any value out of any of these new workflows or tools if you’re not using them. So we want to know who’s using them. Adoption does not guarantee impact, but you can’t get any impact without adoption.
If we just want to know where the tools are being used we can see that through a token leaderboard. That can provide some healthy competition among the team. Like, “I want to get on the board. I want to make sure that I’m showing up there.” That’s good.
It can also help us identify who the top consumers are and maybe we can go and learn from them. When we look at the folks that are near the top of that leaderboard, it’s an opportunity for us to ask them to host a lunch and learn, give us a demo, share with your colleagues what you’re doing so that we can pick up some of these ideas. Maybe you’ve got some skills that you’ve built that should actually be distributed across the org.
I think it can also be at least somewhat motivating for those people that are on the bottom or not showing up on the leaderboard at all. Use that leaderboard to identify key adopters.
Is this just Goodhart’s Law on steroids?
This is so obviously an implementation of Goodhart’s law that we have to be very careful. It can lead to real gaming. The flip side though to that gaming, and maybe it is happening in some places, is looking for optimization opportunities. Frankly I think there’s definitely ways that we can optimize how we’re utilizing those tokens.
But I think as an industry we’re far too early in this journey to worry about optimization right now. We are still in a heavy learning and experimentation mode. We’re not yet in the optimization mode.
As with a lot of metrics, we have the opportunity to look at them at the individual level and you can also look at them at a team level. I’d be much more interested in seeing a team that’s getting a lot of use out of AI versus a team that’s not. Maybe we can start to find some patterns there. Maybe one team is building mostly greenfield apps and the other is handling a legacy app.
It’s just like lines of code. It feels like as an industry we learned and finally agreed that lines of code is probably not a great measure [of productivity].
Where does token usage fit within the DORA framework?
In our most recent annual research, we didn’t ask any questions about tokens. I suspect that in 2026, this question around tokenmaxxing and what measurements orgs have in place will be there.
I don’t think there are any agreed upon standards at an industry level for how to measure engineering productivity. So we’re always looking for the metrics that people are using and I suspect that we will be starting to identify tokens as a measure of adoption.
I think it is a fine measure of adoption, but a terrible measure of impact.
Hot links
Agentic Engineering Management by Péter Szász
A great post about how AI agents can be applied to the three key pillars of engineering management: execution, team dynamics, and personal development. Pretty much what this newsletter covers in a nutshell 🥜.
“Just like ICs slowly become managers of their own agents, accountable for agent output the way they’re accountable for their code today, the EM role shifts up an abstraction layer too: delegating more managerial tasks to agents, synthesizing and selecting where to go deeper. This is quite similar to what a director or VP does today.”
What I liked about this article is how it framed the result of this shift. If agents can free up engineering managers, what do they choose to do with that extra capacity? Be more hands on? More strategic? Expand your remit? Either way, being a pure operator is a difficult path to tread now.
Your AI-coding budget just got a lot more complicated - LeadDev
How Anthropic’s silence fueled a Claude Code trust crisis - LeadDev
GitHub Copilot is moving to usage-based billing
Three stories that highlight the other side of tokenmaxxing: outages, service degradation, and changing pricing models as vendors scramble to keep up with ballooning demand for their tools, especially token-heavy agentic capabilities.
Who Owns the Code Claude Wrote? - Legal Layer
A handy explainer of copyright law for AI-generated code by Sena Evren.
Does your boss have AI brain - Link in Bio
It’s not about engineering specifically, but I found Rachel Karten ’s piece on the impact AI is having on marketing professionals really eye opening.
“I heard many more examples just like this. That leadership feedback is now relegated to their AI chatbot of choice. Copy, briefs, and campaigns all run through AI. One person shared with me, ‘My managers keep taking my copy that’s with them for final approval, running it through AI, and going ‘here, use this.’ A huge waste of time and effort on everyone’s part, not to mention demoralizing as hell.’”
Upcoming events
LDX3 London
Our biggest event of the year, LDX3, kicks off next month in London and you can still be there. We’ve got Justin Reock, Michael Lopp, Maude Lemaire, and more speaking, as well as big debates, table talks, and most importantly, the chance to meet me in person (no seriously).
What to do when there’s too much code to review
Code review has been a hot topic around here recently, so I was excited to pull this panel together for May 7 to discuss what to do when there’s too much code to review, with James Garrett, Pete Hodgson, and our friends at CodeRabbit.
One more thing
Thanks for reading and see you next week!






Thanks for the mention! Loved the main story about the absurdity of tokenmaxxing!