Interval-level accuracy: why a good daily number hides bad days

Forecasting · ~6 minute read

“98% accurate” — at what grain?

It’s one of the most reassuring sentences a planner can say and one of the most misleading: “our forecast was 98% accurate last month.” Accurate at what grain? Almost always, that headline is a daily or monthly number — and a forecast that nails the daily total can be wildly wrong inside the day, over-forecasting the morning and under-forecasting the afternoon by amounts that cancel out in the total but wreck the roster. Because you staff at interval grain, not daily grain, the interval error is the one that actually hits service. Daily accuracy is the number that looks good in the pack; interval accuracy is the number that decides whether customers waited.

The aggregation trap

Errors cancel when you add them up. A forecast that’s 20% too high at ten and 20% too low at two has a daily total that’s perfect — and a day that was over-staffed all morning and underwater all afternoon. The more you aggregate, the better the forecast looks, because opposite errors quietly net off; report monthly and you can hide a multitude of bad days inside a flattering headline. This is why the grain of the metric matters as much as the metric itself. A WAPE measured on daily totals flatters; the same WAPE measured interval by interval tells the truth, because there’s nothing left to cancel against. If you only ever look at the aggregated number, you are systematically blind to the errors that the roster is built on.

Two curves with identical daily totals and opposite interval errors. The daily number says “perfect”; the day says otherwise. Measure where you staff.

Measure where you act

The fix is a principle: measure forecast accuracy at the grain you make decisions on. Since you roster by interval, track interval-level error — WAPE across intervals, and the distribution of interval misses — not just the daily or weekly total. Keep the aggregate number too, because it has its uses for the longer-horizon capacity view, but never let it stand in for the operational one; they answer different questions. Watch the shape of the error, not only its size: a forecast that consistently mis-times the peak is a different, more fixable problem than one that’s evenly noisy, and you’ll only see the difference at interval grain. And be honest in the pack about which number you’re quoting, because a “98% accurate” that quietly means “at the daily total” isn’t a measure of how well you staffed — it’s a measure of how much aggregation you did. The forecast does its real work interval by interval, so that’s where it has to be judged.

Pair this with the accuracy metrics that matter, the forecast accuracy calculator, and data quality for forecasting.