Measuring accuracy honestly

Deep-dive lesson · about 12 minutes · short quiz at the end

ccPlanning academy · forecasting · deep dive

Measuring accuracy honestly

If you can’t score it fairly, you can’t improve it.

The big idea

“The forecast was good” isn’t a measurement.

You can’t improve what you don’t measure, and how you measure changes what you optimise for. Pick the wrong accuracy metric and you’ll cheerfully tune your forecast in the wrong direction.

The starting point

Error = forecast − actual.

Per interval, the raw error is simple. The art is in how you summarise thousands of these into a number that means something — without letting the small intervals or the cancelling signs mislead you.

Metric 1

MAPE — mean absolute percentage error.

Average the absolute error as a percentage of actual, across all intervals. Intuitive and widely quoted: “we’re running at 12% MAPE.”

But it has a nasty flaw in a contact centre.

The MAPE trap

Small intervals dominate.

Being 5 contacts off when 10 were expected is a 50% error. Being 50 off when 2,000 were expected is 2.5%. MAPE treats the quiet 8am interval as far more important than the busy midday one — the opposite of what you care about.

Metric 2

WAPE — weighted absolute percentage error.

Sum the absolute errors, divide by the total actual volume. It weights every interval by how busy it was, so the intervals that matter for staffing drive the score.

For most planning, WAPE is the fairer headline number.

The one that matters most

Bias — are you always wrong the same way?

MAPE and WAPE take the absolute error, so they can’t see direction. Bias keeps the sign: it’s the average of forecast minus actual. It answers a different, crucial question — do you consistently over- or under-forecast?

Why bias is worse than scatter

Two forecasts, same WAPE, very different harm.

Scatter around zero averages out across a week. A consistent lean does not.

Why it compounds

Bias accumulates; scatter cancels.

Random over- and under-forecasts roughly net off — some overstaffed intervals, some understaffed, broadly balanced. A persistent under-forecast understaffs every interval, every day, draining service level and burning agents with no relief.

Reading them together

Track WAPE and bias side by side.

WAPE tells you how big the typical miss is. Bias tells you whether the misses point one way. A low-WAPE, near-zero-bias forecast is healthy. Low WAPE with strong bias is a forecast quietly hurting you.

The level question

Measure at the level you act on.

A forecast can look superb at monthly level and fall apart by interval — the errors hide inside the totals. Score accuracy at the grain you actually staff to, not the grain that flatters you.

Closing the loop

A metric you don’t act on is decoration.

Measurement only pays off if it changes something: a persistent bias prompts an adjustment, a spike in WAPE on Mondays prompts a look at the Monday profile. Accuracy tracking is the start of the improvement loop, not the end.

The numbers, side by side

Why MAPE and WAPE disagree

An 8am interval forecast at 10, actual 15: that’s a 50% miss. A midday interval forecast at 2,000, actual 1,950: a 2.5% miss. MAPE averages those to a scary ~26% — driven by the interval nobody staffs to.

WAPE weights by volume: 55 contacts of error against ~1,965 actual is under 3%. Same forecast, and WAPE is telling the truth about the day that matters.

The takeaway

WAPE for size, bias for direction — at staffing grain.

Prefer WAPE over MAPE so busy intervals count. Watch bias hardest, because a forecast that’s always short hurts far more than one that’s simply noisy. Then act on what you find.

Now test yourself ↓

1 / 13

Slides done? Here’s the same idea in a bit more depth — the part worth keeping.

In depth: score it fairly or tune it wrong

“The forecast was good” is an opinion, not a measurement — and you can’t improve what you don’t measure. Worse, how you measure changes what you optimise for, so the wrong accuracy metric will have you cheerfully tuning the forecast in the wrong direction. The raw material is simple: error is forecast minus actual, per interval. The whole craft is in how you summarise thousands of those into a number that means something, without letting the tiny intervals or the cancelling signs mislead you.

MAPE flatters the wrong intervals; WAPE doesn’t

MAPE — the average absolute percentage error — is intuitive and widely quoted, but it has a nasty flaw in a contact centre: it treats every interval as equally important. Being five contacts short when ten were expected is a 50% error; being fifty short when two thousand were expected is 2.5%. MAPE makes the quiet 8am interval count for far more than the busy midday one — precisely backwards from what staffing cares about. WAPE fixes this by summing the absolute errors and dividing by total actual volume, so every interval is weighted by how busy it was. For most planning, WAPE is the fairer headline number.

Bias is the error that actually hurts

Both MAPE and WAPE take the absolute error, so neither can see direction — and direction is the thing that does the damage. Bias keeps the sign: it’s the average of forecast minus actual, and it answers whether you’re consistently over- or under-forecasting. Two forecasts can share an identical WAPE and do very different harm. Random scatter around zero roughly nets off across a week — some intervals over, some under. A persistent lean does not: a forecast that’s always short under-staffs every interval, every day, draining service level and burning agents with no relief. So track WAPE and bias side by side, measure them at the grain you actually staff to rather than the monthly total that flatters you, and remember a metric you don’t act on is just decoration.

The principle to remember: WAPE for size, bias for direction, both at staffing grain — then act on what they tell you. A forecast that’s always short hurts far more than one that’s simply noisy.

Measuring accuracy honestly