Why your forecast accuracy is probably better than you think

Forecasting · Leadership · ~8 minute read

The benchmark you inherited is probably wrong for you

Most planning teams are held to a forecast accuracy benchmark that arrived in the operation without anyone asking whether it actually fits. “90% MAPE at interval” gets quoted as if the maths is universal. It isn’t. Operations of different sizes, different complexities, different volatilities have mathematically different accuracy ceilings — and the team that’s pushed to a benchmark beyond their structural limit ends up looking worse than they are while the team handed an easier benchmark looks better than they deserve. This article walks through how to compute the fair benchmark for your specific operation, why doing so usually shows the team performing better than the inherited target suggests, and the conversation with finance and operations that gets easier once you have the right number.

The three structural ceilings

1. The Poisson floor. See why understanding Poisson matters for planners. For voice arrivals, the natural noise around any forecast equals the square root of the mean. A 25-call interval has 20% coefficient of variation; a 250-call interval has 6%; a 2,500-call interval has 2%. The smaller the interval, the worse the maths makes you look.

2. The volatility ceiling. Some queues are inherently volatile — weather-driven, marketing-driven, event-driven. The forecasting model can’t predict away volatility that’s genuinely random. A stable steady-state queue can reach 5% WAPE; a volatile event-driven queue is doing well at 12%.

3. The complexity ceiling. Multi-channel, multi-skill, multi-product operations have more dimensions to forecast across and more interaction effects. A single-skill voice queue has a different accuracy ceiling than the same volume split across six skills.

Knowing your specific ceiling lets you separate genuine forecasting performance from the structural floor your operation is fighting.

The 10% MAPE benchmark beloved of finance is structurally easy for large stable operations and impossible for small volatile ones. The right number is your operation’s achievable floor, not the industry headline.

How to compute your fair benchmark

Three steps, two days of work, replaces an unfair benchmark with a defensible one for the rest of the operation’s life.

Step 1: Compute the Poisson floor for your typical interval. Pull the mean interval volume for a representative week. Compute the standard deviation as the square root of the mean. The natural CoV is sqrt(mean)/mean. That’s the floor on interval-level WAPE the maths allows.

Step 2: Add the model error. Pull a quarter of historical forecast vs actual. Compute WAPE at daily and weekly grain (where natural noise is smaller). The residual after natural noise is your model’s actual error.

Step 3: Build the fair benchmark. Natural floor + model error + judgement margin (10–15% for the unforeseen). That’s the target the planning team can credibly hit.

Most teams running this exercise find their fair benchmark is 1–3pp easier than their inherited target, and that they’re hitting the fair benchmark more often than the inherited one. The number doesn’t change reality — but it changes the conversation.

The conversation with finance and operations

Three sentences that move finance and operations off the wrong benchmark.

1. “The industry MAPE benchmark of 10% works for operations with 2,000+ contact intervals. We have 80-contact intervals, where the maths puts the natural floor at 11%. Holding us to 10% is statistically impossible.”

2. “Our genuinely fair benchmark is 14% WAPE at interval, 6% at day, 3% at week. We’re hitting all three. The team is performing well.”

3. “We’ll continue to drive improvement, but the headline should be the level we’re actually achieving against a fair target, not a level the maths doesn’t allow.”

That conversation, delivered calmly with the maths to back it up, usually changes how the planning function is held to account — in the team’s favour, accurately.

The regulatory dimension

For regulated operations — FCA, Ofgem, Ofcom — SLA targets are binary points, not ranges. The forecast accuracy benchmark is internal; the SLA commitment is external. Both should exist, but the planning function should be clear which is which. SLA at 90/20 is a board commitment; forecast accuracy at 8% WAPE is an internal performance measure. Mixing them produces conversations that go in circles.

What “better than you think” looks like in practice

Operations that have done this exercise report three patterns. First, planning teams that thought they were under-performing discover they’re actually meeting a fair benchmark consistently. Second, the morale lift is real — analysts working hard on a target they can’t reach burn out; analysts hitting a target they can defend stay engaged. Third, the conversation with finance gets easier because the planning team has the maths and isn’t guessing.

The work isn’t to lower the bar. It’s to put the bar where the maths actually puts it. Most operations are doing better than their inherited benchmark suggests; the planning team’s job is to show that without sounding defensive.

Conclusion

Forecast accuracy benchmarks should be computed, not inherited. The Poisson floor, the volatility ceiling, and the complexity ceiling together set the achievable level for any specific operation. Most planning teams find their fair benchmark is easier than the target they’ve been held to and they’re hitting the fair benchmark more often than expected. The conversation with finance and operations gets easier once the maths is on the table. Done well, this exercise is a one-time investment that pays back through the planning function’s career.

Pair with Poisson and natural noise, forecast accuracy metrics that matter, forecast with ranges, and showing planning team success.