Forecasting when a chatbot sits in front of your agents

Intermediate level · ~6 minute read

Introduction

When a chatbot or virtual agent answers first, your agent forecast stops being a forecast of total demand and becomes a forecast of what the bot doesn’t handle. That sounds like a small change. It isn’t. A single “containment rate” pulled from a vendor slide is one of the fastest ways to mis-staff a contact centre, because containment is neither stable nor clean. This article sets out how to forecast agent demand properly when automation sits in front of the queue.

Agents see the residual — not the demand Total demand 1,000 chats Bot tries first Truly contained — 550 resolved, never comes back Escalates now — 350 arrives pre-annoyed, longer AHT Bounces back later — 100 a fresh contact tomorrow
“Containment” hides three different outcomes. Only the genuinely resolved contacts truly leave your demand; escalations arrive harder, and a slice bounces back as fresh volume. Your agent forecast has to model all three.

Containment is not one number

A reported containment rate of, say, 55% bundles together outcomes that matter very differently to a planner. Some contacts are genuinely resolved — the customer got what they needed and won’t be back. Some are abandoned in frustration without being resolved, which counts as “contained” on the bot dashboard but reappears as a call tomorrow. And the contacts that escalate to an agent arrive having already spent effort failing to self-serve, so they’re longer and more emotionally charged. A single containment percentage tells you none of this, yet all of it lands on your queue.

Forecast at the contact-reason level

The only robust way to do this is to decompose demand by reason and estimate containment for each. The bot might resolve 85% of password resets but 10% of complex complaints. A blended rate applied to total volume will be wrong for every individual reason and only “right” by accident at the aggregate — and it will mis-predict your mix, which is what actually drives agent AHT. Forecast total demand by reason, apply a reason-level containment, and what remains is your real agent volume and your real agent mix.

Model the escalations honestly

Escalated contacts are not average contacts. They’ve already failed once, so they tend to take longer, carry more frustration, and sometimes need a senior or specialist agent. Build a higher AHT for the escalated stream than your old blended figure, and remember that the residual mix is weighted toward your hardest reasons — the same effect that pushes agent AHT up whenever automation deflects the easy work. If your routing sends bot escalations to a particular skill, plan that skill specifically rather than the team as a whole.

Account for the bounce-back

The slice of contacts the bot “contained” by frustrating the customer doesn’t disappear; it returns, often through a different channel and often angrier. Track repeat-contact rates after bot interactions and feed them back into your volume forecast as genuine demand. The cleanest way to keep yourself honest is to measure resolution at the customer level — did the issue actually get solved — rather than relying on the bot’s own containment metric, which has every incentive to look good.

Expect it to move

Bot containment is a moving target. Models are retrained, scope expands, the knowledge base improves, and customer behaviour shifts as people learn what the bot can and can’t do. A containment rate that’s accurate today will drift within months, so treat it as a tracked, regularly re-forecast input rather than a fixed assumption baked into an annual plan. The discipline is the same one good planners already apply to volume and AHT: decompose, model each part, measure honestly, and refresh often.

Related: why deflection raises AHT and what planners get wrong about gen-AI.