Using AI for contact centre forecasting

Forecasting · ~8 minute read

A topic that has lost its measure

There is no shortage of vendors claiming AI will transform contact centre forecasting, and there is no shortage of planners quietly suspecting they are oversold. Both groups are partly right. AI — by which most people mean a mixture of machine learning, deep learning, and the new generation of foundation models for time series — does add real value to contact centre forecasting in specific places. It also fails to add value in many other places where it is enthusiastically marketed. Telling the two apart is the most useful thing a workforce planner can learn about the topic, and this article is an honest attempt to set out what AI is genuinely good at, what it is not, and how to start using it without burning credibility on a project that quietly under-delivers.

What “AI for forecasting” actually means

The term covers a spectrum. At the lightweight end sit the classical machine-learning models — gradient boosting (XGBoost, LightGBM), random forests, regularised regression — which take a feature table (volume, day-of-week, holiday flags, marketing campaigns, weather) and learn the weights that best predict the target. These are well understood, fast to train, and explainable. In the middle sit the time-series-specific approaches — Prophet, ETS state-space models, hierarchical reconciliation methods — which combine statistical rigour with some learned components. At the heavy end sit the deep-learning architectures — LSTMs, temporal fusion transformers, N-BEATS, PatchTST — and the new foundation models like Chronos, TimeGPT, and Lag-Llama, which aim to apply LLM-style pre-training to forecasting tasks across many time series at once.

Each of these has different data requirements, training complexity, accuracy profiles, and explainability characteristics. Talking about “AI forecasting” as a single thing — as vendors and trade press often do — confuses the discussion before it starts. The right tool depends on the data you have, the question you are answering, and the stakeholders who will sign off on the answer.

The honest accuracy lift from AI on a typical forecast. Real gains exist but they are smaller than the hype suggests.

Where AI genuinely helps

The cleanest win for AI in contact centre forecasting is multivariate regression at scale. A planner armed with a good driver dataset — historical volume, day-of-week, holidays, marketing campaign sends, product launches, weather, competitor outage data, even social media sentiment — can use a gradient-boosted model to combine these into a single forecast that consistently outperforms a univariate Holt-Winters baseline. The gain is typically a few percentage points of MAPE on stable queues, and substantially more on queues that respond strongly to drivers that the time-series-only model has no way to see.

A second win is hierarchical reconciliation. Operations with many sub-queues — channels, skills, languages, regions — face a perennial problem: the bottom-up sum doesn’t equal the top-down number. Modern hierarchical methods (the MinT family being the most cited) reconcile forecasts across the hierarchy using machine-learned weights that respect the data, and the answer is usually better than either purely top-down or purely bottom-up alternatives.

A third win is anomaly detection on the inputs to the forecast. ML models trained to spot unusual arrival patterns, AHT shifts, or interval-level outliers can flag issues for the planner faster than manual inspection, freeing up time for the parts of the job that genuinely need human judgement.

A fourth, increasingly real win comes from foundation models. Models like Chronos, TimeGPT, and Lag-Llama are pre-trained on enormous corpora of time series and can produce competent forecasts on unseen series with little or no fine-tuning. The accuracy is often comparable to a well-tuned Holt-Winters out of the box, with no model-selection effort. For operations with many small or new queues that lack the history to fit a bespoke model, foundation models are a genuine step forward.

A fifth, often overlooked win is explainability tooling. SHAP values, partial dependence plots, and feature-importance reports turn an ML model into a story about why the forecast moved — which is the conversation finance and operations actually want to have, and one that traditional time-series models support poorly.

Where AI does not help — or actively hurts

The accuracy gain from AI is real but smaller than vendor literature suggests, and several conditions can flip the comparison the other way entirely. The most common is small data. Modern ML methods need history to learn from; queues with less than two years of clean data are usually better served by a well-tuned classical model. The training process consumes degrees of freedom that small datasets simply don’t have, and the resulting model overfits patterns that won’t repeat.

Highly non-stationary regimes are also dangerous. After a pandemic, a major restructure, a product launch, or a regulatory change, the statistical relationships the model learned no longer apply. ML models cope with this badly because they have absorbed assumptions a planner might never have written down explicitly. A simple model with a planner’s manual override is often more robust through these moments than a sophisticated model that nobody can override cleanly.

Black-box risk matters more than vendors admit. In regulated industries, in operations subject to internal audit, or in any conversation where the forecast must be defended at a senior level, a model that nobody on the team can explain is a liability. The same forecast accuracy from a less explainable model is usually a worse outcome, not an equivalent one. Governance teams will tolerate Holt-Winters with a known limitation more readily than they will tolerate a deep-learning model with unknown failure modes.

Operational complexity is the final cost. An ML pipeline that retrains weekly, watches its own drift, manages its features, and reconciles its hierarchy is significantly more infrastructure than a spreadsheet plus exponential smoothing. The cost is real and recurring. Operations that adopt the technology without resourcing the supporting engineering find that the model degrades silently, and the analyst who built it spends more time keeping it alive than analysing its output.

A practical adoption path

The path that consistently works is to treat AI as an addition to a clean foundation, not a replacement for it. Start by getting the basics right: clean data, structured driver inputs, a defensible classical baseline, and a measurement framework that tracks accuracy honestly across multiple horizons. Without those four, layering AI on top will mostly automate the existing problems faster.

Once the foundation is in place, the highest-return first step is usually a gradient-boosted regression that combines the time-series baseline with structured drivers. This is well understood, explainable, fast to train, and easy to govern. If it adds two or three percentage points of MAPE in benchmarking, the case for going further is strong. If it doesn’t, the case is weak and the project should stop there.

The second step, where the data justifies it, is hierarchical reconciliation across channels and skills. The third, more speculative step is exploring foundation models for queues where bespoke training is hard — small queues, new queues, or queues with unstable history.

Throughout, hold to two principles. The first is that the planner is augmented, not replaced. The model produces a baseline; the planner overlays event knowledge, applies judgement, and signs off on the published forecast. The second is that explainability is not optional. Every published forecast should be defensible by a human who can articulate the drivers, the assumptions, and the known limitations.

Governance, drift, and the maintenance burden

ML forecasts decay. A model trained on last year’s data will gradually misalign with this year’s reality, and the misalignment can creep in long before it becomes obvious. Operations adopting AI for forecasting must plan for ongoing monitoring — accuracy tracked weekly, automated alerts when drift exceeds a threshold, scheduled retraining, and clear ownership of the decision to retrain or roll back. This is engineering work, and a planning team that takes it on without engineering support typically ends up running an unmaintained model that nobody fully trusts within twelve months.

Governance follows the same logic. A policy document covering model lineage, training data provenance, validation methodology, drift monitoring, and override authority is not glamorous work but it is what lets the model be trusted by audit, finance, and senior management. Without it, the first time the model is wrong in a way the planner cannot quickly explain, confidence collapses and the team retreats to spreadsheets — usually with more bitterness than they had before.

Common mistakes

Three patterns recur. The first is leading with the model rather than the data: the team adopts an impressive technology before they have a clean dataset to feed it, and the impressive technology produces unimpressive results. The second is benchmarking against straw men: an ML model that beats a naive last-year-same-week comparison is not the same as a model that beats a properly-tuned Holt-Winters with event overrides. Honest benchmarking matters. The third is treating AI as a way to remove the planner. The planner’s value is the judgement layer on top of any model; removing them removes the judgement, and the forecast quality follows.

Conclusion

AI for contact centre forecasting is real, useful, and overhyped. The practical opportunity is meaningful but specific: better use of structured drivers, cleaner reconciliation across hierarchies, faster anomaly detection, and viable forecasts for queues that previously had none. Capturing it well requires a clean foundation, honest benchmarking, sustained engineering investment, and a clear-eyed view of what the technology cannot yet do. Captured carelessly, it produces a more sophisticated version of the existing problems plus a new layer of opacity. The planners who get the most from AI are the ones who use it to augment a discipline they already do well, not as a replacement for one they have not yet built.

Pair this with the Excel paradox for the tooling-maturity perspective and the beginner’s guide to forecasting for the foundation any AI work should sit on top of.