When AI deflects the easy contacts, your AHT goes up

Intermediate level · ~6 minute read

Introduction

Every business case for a chatbot or AI assistant promises the same thing: deflect a slice of contacts away from agents and cut headcount. The volume reduction is real. But there’s a second-order effect almost nobody puts in the model, and it quietly eats the savings — the contacts that remain are harder, so your average handle time goes up. Plan for the volume drop alone and your staffing numbers will be wrong in a way that only shows up after go-live. This article explains why, and how to plan for it.

A worked illustration: the bot deflects 30% of contacts, almost all from the easy band. Volume falls 30%, but because the cheap, fast contacts are the ones that left, the agents’ average handle time rises. Workload falls by less than the contact count suggests.

Why the easy ones go first

Automation is good at exactly the contacts that are simple, repetitive and well-defined: balance checks, password resets, “where’s my order,” opening hours. Those are also, by definition, your shortest contacts. A virtual agent doesn’t deflect a random sample of your demand — it skims the quick, cheap interactions off the top and leaves the complex, emotional, multi-step ones for humans. So the residual contacts arriving at your agents are disproportionately the long ones: the exceptions, the complaints, the edge cases, the conversations a bot tried and failed to resolve.

The AHT effect, in numbers

The diagram makes it concrete. Suppose 1,000 daily contacts split 50% easy (180s), 30% medium (360s) and 20% hard (600s) — a blended AHT of 300 seconds. Deflect 30% of all contacts, nearly all from the easy band, and you’re left with 700 contacts that are now mostly medium and hard. The blended AHT of what remains climbs to around 360 seconds. Total workload — contacts multiplied by AHT — falls from 300,000 agent-seconds to about 252,000, a 16% reduction, not the 30% the headline deflection rate implied. Staff on the 30% and you’ll be roughly 14% short.

The two traps in the business case

The first trap is staffing on deflected volume instead of deflected workload. Always convert to workload (volume × AHT) before you size anything, because that’s what actually consumes agent time. The second is using your old AHT for the post-automation world. The moment a bot is in front of your queue, your historical AHT is obsolete — it was measured on a contact mix that no longer arrives. You need a fresh AHT estimate for the residual mix, which means modelling the contact types the bot will and won’t take, not just a single deflection percentage.

It also reshapes the rest of your plan

Harder contacts don’t just take longer; they need more experienced agents, richer skilling, and longer ramp times, and they tend to come with more after-call work and higher emotional load. A queue stripped of its easy wins is a tougher place to work, which can lift attrition if it isn’t managed. Service-level behaviour shifts too: a queue of long, variable contacts is “peakier” and less forgiving than a queue padded with quick ones, so the same service target needs a slightly bigger staffing cushion than before.

How to plan for it

Decompose your demand by contact reason and tag each as bot-suitable or not. Model the deflection at the reason level, then rebuild the residual volume and the residual AHT from what’s left — never from a single top-line percentage. Convert to workload and only then size the team. And measure AHT obsessively in the months after launch: containment rates drift, the bot’s scope expands, and your residual mix keeps moving, so a one-off forecast won’t hold. The planners who get this right will be the ones who treated automation as a change to the shape of demand, not just its size.

Related reading: what planners get wrong about gen-AI, and the demand decomposition method this approach relies on. See also forecasting AHT and why AHT targets backfire.