Forecast accuracy metrics that matter
Why MAPE on its own is not enough
Most contact centre forecast accuracy reports begin and end with MAPE — Mean Absolute Percentage Error. The metric is familiar, easy to compute, and easy to explain. It is also incomplete. MAPE alone tells you the average magnitude of the error but says nothing about whether the forecast was systematically over or under, whether the errors are concentrated in specific intervals or spread evenly, or whether short and long forecast horizons are equally well served. Planning teams that report only MAPE end up with a sense of how accurate they are on average, and almost no information about where they need to improve. This article walks through the metrics that complete the picture, what each one is designed to surface, and what a balanced accuracy report should include.
MAPE: the familiar starting point
MAPE is the mean of the absolute percentage error across all observations — for each interval, the absolute difference between forecast and actual, divided by actual, then averaged across the period. Its strength is its intuitive interpretation: a MAPE of 8% means the forecast is, on average, eight percent off. Its weaknesses are well known. MAPE blows up when actuals are small — one quiet interval with two contacts forecast and zero actual produces an infinite or undefined error that distorts the average. MAPE also weights small intervals the same as large ones, which means a 50% error in a 10-contact interval has the same impact on the headline number as a 50% error in a 1,000-contact interval. For a planner whose job is staffing real volume, that weighting is misleading.
WAPE: weighted by the volume that matters
Weighted Absolute Percentage Error solves the MAPE problem by weighting each interval by the actual volume. The calculation is the total absolute error across all intervals divided by the total actual volume across the period. A 50% error on a 10-contact interval contributes far less than a 50% error on a 1,000-contact interval. WAPE is almost always a better metric than MAPE for contact centres because it reflects the staffing risk the planner is actually managing. The downside is that WAPE is less intuitive to explain than MAPE; the upside is that the planning team understands quickly that an SL miss in a busy interval matters more than an SL miss in a quiet one, and WAPE captures that.
sMAPE: handling the zero-actuals problem
Symmetric MAPE divides the absolute error by the average of forecast and actual, rather than by actual alone. The result is bounded between 0% and 200%, which means a zero actual no longer produces an infinite error. The metric is useful when an operation has many low-volume intervals (back-office work, niche skills, off-peak hours) where MAPE becomes unstable. sMAPE has its own quirks — it penalises over-forecasts and under-forecasts asymmetrically in some edge cases — but for most planning use it is a sensible complement to WAPE.
MAE: when percentages mislead
Mean Absolute Error is the mean of the absolute difference between forecast and actual, in raw units. For a planner reporting to operations, MAE in “average contacts off per interval” can be more meaningful than a percentage. It is particularly useful when actual volumes vary across a wide range and the percentage error in tiny intervals dominates the percentage error in large ones. Most accuracy reports benefit from including both a percentage metric (WAPE) and an absolute metric (MAE) so the audience can interpret the error in whichever frame they find more useful.
Bias: the metric people skip
Of all the accuracy metrics, bias is the one most often missing and the one that usually matters most for the planning conversation. Bias is the average signed error — forecast minus actual, not absolute — expressed as a percentage of actual. A bias of zero means the forecast is right on average, even if individual intervals are off in either direction. A bias of +5% means the forecast is systematically high; a bias of −5% means it is systematically low. A planning team can have a 7% MAPE and a 5% bias at the same time, which means most of the absolute error is in one direction and the operation is consistently over-staffing or under-staffing. That is a structural issue, not random noise, and the fix is methodological rather than operational. Bias should be in every accuracy report.
What “good” looks like
Benchmarks vary by industry, channel, and queue volatility. As a rough guide for a stable voice queue with twelve or more months of history, a daily WAPE of 5–10% and a weekly WAPE of 3–7% is mature performance. Small queues, multi-skill operations, and digital channels will sit higher; back-office work with weekly aggregation can sit lower. Bias should be within plus or minus 2% over any rolling thirteen-week period; a sustained bias greater than that signals a systematic forecasting error that needs methodology work. The exact numbers matter less than the principle that the target should be set in advance, tracked honestly, and reviewed when the trend moves materially in either direction.
Reporting at multiple horizons
A single accuracy number across all forecast horizons hides where the planning team is genuinely strong or weak. A useful accuracy report shows the same metric at several horizons — one day ahead, one week ahead, one month ahead, three months ahead. The pattern is informative: most operations are accurate close in and degrade quickly past four weeks; if the degradation is sharper than expected, the underlying model is brittle; if accuracy at one week is worse than at three weeks, something is wrong with the most recent inputs. Reporting at multiple horizons converts “accuracy” from a single noisy number into a structured signal about where the methodology needs work.
Tracking accuracy over time
Single accuracy snapshots are vulnerable to single-week noise. Trend matters more than the latest reading. Track the rolling four-week and rolling thirteen-week WAPE alongside the weekly number, and look at the trend before reacting to any single observation. Discuss accuracy in a fixed slot every week, with rolling averages on the agenda, and use the discussion to feed methodology improvements back into the next cycle. The point is not to celebrate good weeks or apologise for bad ones; it is to spot patterns early enough to do something useful about them.
Common mistakes
Three patterns recur. The first is over-reliance on MAPE without WAPE, leaving the headline number distorted by low-volume intervals. The second is reporting accuracy without bias, which hides the systematic error that is usually the most actionable signal. The third is treating accuracy as a single number for the whole operation when the operation has multiple queues, channels, and skills with very different volatility — the planning team improves much faster when each segment is tracked separately and the conversation is segment-specific.
Conclusion
A balanced accuracy report shows WAPE alongside MAE, bias alongside both, and the same numbers at multiple horizons with rolling-average context. That report tells the planner where the methodology is working and where it is not, gives the operations team a frame for variance conversations, and provides senior management with a credible measure of how well the planning function is delivering. It costs little to set up and pays back continuously. Operations that get this right find that accuracy is not just a metric to report; it is a conversation that makes the planning function visibly better year after year.
Pair this with the beginner’s guide to forecasting for the foundation, and using AI for contact centre forecasting for how accuracy benchmarking should drive technology adoption decisions.
Comments
Comments are powered by Giscus — sign in with GitHub to join the discussion.