organizations-are-measuring-ai-productivity-with-the-wrong-metrics

As AI coding becomes the default, software engineers are no longer the primary authors of code. The scope of their role has fundamentally changed. For enterprises, the challenge is that the frameworks they rely on to measure engineering work were built for a pre-AI era and are losing relevance.

Harness' State of Engineering Excellence 2026 report found that organizations are reporting record productivity gains, while no longer having the instruments to know if those gains are real – or what they're costing.

The Invisible tax

On the surface, AI coding tools seem to be delivering exactly what leaders hoped for: faster delivery, higher output, and more productive developers. But beneath those gains, a growing amount of manual and operational toil is emerging.

The report found that whilst 90% of engineering practitioners cite improvements in developer productivity, many are actually now spending more of their day on manual work. In fact, 81% say they now spend more time in code review processes after having adopted AI tools.

This is the core tension. As AI generates more code, output metrics improve and cycle times shorten. Developers report feeling more productive because they create more output. Yet much of the effort required to deliver that output to the business and its customers remains largely invisible.

In practice, nearly a third of developer time is now spent validating outputs, fixing defects, reviewing suggestions, and context switching between tools and systems.

The result is an invisible tax on engineering time. AI isn’t reducing productivity, but units of work have changed faster than engineering organizations’ ability to measure it.

Legacy frameworks weren’t built for AI

For years, engineering leaders have relied on frameworks like DORA metrics, cycle time, velocity, and developer experience surveys to understand engineering performance. These remain useful, but they were designed for a world where engineers primarily wrote code.

AI changes that dynamic completely. Developers are increasingly acting as reviewers, validators, orchestrators, and governors of machine-generated outputs rather than manually authoring every line of code themselves. Traditional productivity frameworks were never designed to capture that kind of work.

The data reflects how wide that disconnect has become. Among engineering practitioners, 94% acknowledge that factors like technical debt, validation effort, and developer burnout are missing from their current metrics. Meanwhile, only 6% believe their existing measurement systems are still fit for purpose.

These frameworks still capture delivery speed, but they miss much of the effort required to sustain it.

A Problem of Trust

The measurement gap is not just technical. It also reflects a growing disconnect between the people designing measurement systems and the people being evaluated by them.

Measurement frameworks are still largely built top-down by leadership, often without structured input from the engineers whose work they are intended to represent. When those frameworks reflect only the leadership perspective, they tend to understate the operational pressure developers are actually experiencing.

That gap shows up clearly in how differently managers and practitioners perceive the same systems. Whilst only 4% of practitioners say they don’t have any concerns about how AI productivity data might be used to evaluate them, that figure rises to 15% among managers.

Concerns are also rising more broadly. More than half of respondents say they worry AI-generated metrics could be used for individual performance evaluation, while 46% highlight concerns around surveillance and unsustainable delivery pressure.

This creates a structural risk for organizations introducing AI measurement frameworks without developer involvement. Metrics only work when they are trusted. Without that trust, they stop reflecting reality and start shaping behavior instead – often to the detriment of the original objective.

Measuring what matters next

As AI tooling takes up an increasing share of engineering budgets, organizations can no longer rely on legacy ways of measuring productivity.

AI is changing the unit of engineering work itself. Measurement systems need to reflect that shift by capturing not only delivery outcomes, but the effort required to produce them. That includes validation work, code quality, cognitive load, technical debt accumulation, AI agent accuracy, and developer wellbeing alongside traditional delivery metrics.

This requires a more collaborative approach. Metrics need to be defined with developers, not just for them, with clarity on how the data will be used and what decisions it will inform.

It also means treating AI performance as a distinct layer of the system. AI agent accuracy, acceptance rates, and cost should be tracked separately from human engineering output, with a consistent definition of what “good” looks like across the organization.

For engineering leaders, the challenge now is not whether to adopt AI coding tools. That decision has already been made. The real question is whether the systems used to understand engineering work are evolving fast enough to keep up with the reality those tools are creating.

How are you measuring developer productivity in an AI-enabled environment – and what impact is that having on the value your team is unlocking from AI?