April 2, 2026
The AI pilot trap

Most organisations are measuring AI success on pilots, not people. That's the wrong way round.


A few weeks ago I sat in a steering committee for an AI rollout. The deck was the usual: green RAG, licence count climbing, prompts-per-seat trending in the right direction. Everyone was pleased.


I asked one question. What was actually different about how the work got done last month compared to the month before? The room went quiet. Nobody had measured it. Nobody had thought to.


The pilot was a success on paper. Underneath, nothing had moved.


The numbers we like to quote

The headline finding from the MIT NANDA Initiative's 2025 State of AI in Business report: 95% of generative AI pilots deliver zero measurable P&L impact. Most of the commentary uses the number to argue that AI is overhyped, the technology isn't ready, or that boards are wasting money.


Useful number. Wrong conclusion.


The pilots aren't failing because the technology doesn't work. They're failing because we're measuring the rollout instead of the shift.


AI doesn't arrive as a transformation

We know how to measure whether change has landed. On any major transformation, whether digital, ERP, or agile, we put readiness surveys in place. Pre and post. We track sentiment, capability, behaviour change. Not perfectly, not always honestly, but the infrastructure is there. We've understood for decades that the rollout and the change are different things.


AI is different. It usually arrives as a pilot rather than as a transformation programme. And pilots inherit pilot logic: deployment, usage, time saved per task in the controlled environment. The readiness work we'd do as standard on a transformation usually isn't there.


So we end up with something transformational in scope, changing how knowledge work gets done, who does it, and what skills matter, being measured like a tool deployment.


Kotter wrote about this kind of failure in 1995. Around 70% of transformations fail, and the reason is almost always the same: organisations skim the human side because the structural side is easier to project-manage. Even where readiness surveys are in place, the discipline to act on them often isn't. AI pilots usually don't start with the surveys at all.


What the dashboards count

Pull up the steering pack for any AI programme and you'll see roughly the same metrics:

  1. Licences activated
  2. Active users this month
  3. Prompts run per seat
  4. Time saved per task in the controlled pilot
  5. Pulse scores from the pilot cohort


Deployment metrics. They tell you whether the tool got installed and whether people clicked on it. They tell you almost nothing about whether anyone is doing their job differently.


I've watched programmes hit every one of these targets and produce nothing that lasts. People log in once a fortnight, run a couple of prompts, get a passable output, and revert to the old way of working for everything that actually matters. The dashboard reports green. The behaviour underneath has not moved.


What the dashboards don't count

The harder things to instrument tend to get left off the slide:

  • Whether people trust the output enough to use it on work that matters
  • Whether they have the judgement to spot when it's wrong
  • Whether they're using AI on the parts of their job they care about, or only on the busywork
  • Whether the way they describe what AI is for has changed
  • Whether they're sharing what they've learned with colleagues
  • Where AI is conspicuously absent, and what that absence is telling you


Twenty-five years of organisational change work has convinced me of one thing above all: if you don't measure the human side, it isn't happening. Not because people are lazy or resistant, but because nothing else is signalling that it matters.


The perception gap

Here's the pattern that worries me most. Self-reported confidence in AI runs ahead of evidenced behaviour, often by a wide margin.


Ask a workforce how AI-ready they feel and you'll get one answer. Look at what they're actually doing with it day to day and you'll get another. Both are real data. The space between them is where the change work lives.


The leaders making decisions about the next phase of investment usually only see the first answer. The pilot dashboard shows adoption climbing. The pulse survey shows confidence rising. The board pack reports progress. And meanwhile, the actual capability shift, the thing that was supposed to make any of it worth doing, isn't happening.


A recent piece from Harvard Business Review frames it well: leaders mistaking silence for alignment, dashboards for behaviour, surface engagement for genuine shift. There's no deliberate self-deception. They're looking at the only data anyone is collecting.


The perception gap is what makes pilot metrics so dangerous. They flatter to deceive.


Why people success has to come first

McKinsey's research on large-scale transformations found that organisations achieving most of their people-oriented goals were twice as likely to sustain the change for more than three years. Their AI work points the same way. Companies seeing significant financial returns from AI are twice as likely to have redesigned end-to-end workflows before selecting the modelling approach.


That sequencing matters more than any other single thing.


If you start with the pilot and bolt people onto it, you get the 95% outcome. If you start with the people, the workflows, the judgement, the genuine capability shift, the pilot tends to land. The pilot becomes a downstream consequence of the people work, not a substitute for it.


I understand the resistance to working this way. It's slower. Harder to dashboard. Doesn't generate the kind of demo a CIO can show the board in March. But in twenty-five years, I haven't seen a transformation produce lasting value any other way.


Upskill the workforce, not just the cohort

The other thing AI rollouts tend to get wrong: training the pilot users, and stopping there.


McKinsey's work on capability building is consistent on the point. Transformations that sustain are the ones that upskill broadly across the organisation (leaders, middle managers, the wider workforce), not the ones that train a small cohort of early users on a single tool.


If you train only the pilot cohort, you've created a capability island. The rest of the organisation watches from the outside, often with mistrust or quiet resentment. When the pilot tries to scale, it scales into a workforce that has none of the same context, language, or comfort with the technology. Most of the time it stalls.


If you upskill broadly first, building genuine AI literacy, judgement, and confidence across the population rather than vendor-specific training tied to one tool, the pilot lands into something more capable. People who weren't in the cohort understand what's being attempted and why. They have the capability to absorb the change when it scales. They have opinions worth listening to.


Empower the workforce broadly and the pilot tends to stick. Train only the cohort and you've built an island. Islands don't scale.


Practical version

If you want to measure people success alongside (or before) pilot success, the move isn't exotic:

  • Measure evidenced behaviour, not just self-reported confidence. What people say they do and what they actually do are two different data sets. You need both.
  • Measure capability shift, not licence count. Can people use AI on harder tasks this quarter than last? Can they spot a bad output? Are they teaching colleagues?
  • Measure where AI is being used, not just how often. Use on low-stakes admin is not the same signal as use on judgement-heavy work.
  • Measure the absence. Where is AI conspicuously not being used? What does that tell you about trust, capability, or fit?


Build broad capability, not just pilot-cohort training. AI literacy, judgement, and confidence across the workforce. The pilot lands into something then, rather than onto an island.


None of that is new. The same kind of measurement we'd apply to any workforce transition. We just don't apply it to AI, because the technology has its own metrics and the technology team owns the rollout.


Which is how we keep ending up in the 95%.


The bit nobody puts on the dashboard

If your AI programme is technically on track but somehow not producing the shift you were promised, look at what you're measuring. The dashboard might be telling you the pilot is fine. The behaviour underneath will tell you something different.


That gap is where the work is. And until we treat it as the actual work rather than something to come back to after the rollout, the failure rate isn't going to budge.


alterNOTION works with organisations treating AI as a workforce transition, not a tool deployment. See how we approach AI adoption.