Cleaning your history
Slides done? Here’s the same idea in a bit more depth — the part worth keeping.
In depth: the step that decides everything before you pick a method
Every forecasting method, from naive to neural net, makes the same assumption: that your history is a fair guide to the future. That assumption is only as good as the data behind it. Leave a freak day in the record and the model treats a one-off as a pattern to repeat; strip out a real recurring event and you’ve deleted genuine seasonality. This is why cleaning the history is where most of the accuracy is quietly won or lost — long before anyone argues about which model to use.
Two kinds of bad day, three things to do
There are two anomalies to hunt. Demand anomalies are real contacts driven by something that won’t repeat on schedule — a marketing email, a recall, a storm, a competitor outage. Measurement anomalies are where the number itself is wrong: a telephony outage, a misrouted queue, a data-feed gap. For each one you have three choices. Keep it if it recurs the same way (payday spikes, statement runs, Black Friday — these are pattern, not noise). Adjust it if it’s real but irregular, flagging the day with an event marker so the method knows it was special and why. Strip and replace it if it’s a true one-off or corrupt reading — and replace rather than delete, because most methods hate gaps; the same interval from neighbouring weeks usually does the job.
Document it, and don’t over-clean
Cleaning is judgement, and judgement needs an audit trail. Keep a log of what you changed, when and why, so next year’s planner knows the spike on the 14th was a recall, not demand — untracked manual edits are how forecasts quietly turn into fiction. And resist the urge to over-clean: smooth away every bump and you teach the model the world is calmer than it really is, leaving it caught out by perfectly normal variation. Remove what won’t recur; respect the noise that will.
The principle to remember: find the anomalies, sort recurring from one-off, replace bad data rather than deleting it, and log every call. The method you pick matters far less than the history you feed it.
Quick quiz
Five questions. Pick an answer to each, then check your score.
1. Why does cleaning history matter so much?
Leave a one-off in and the model repeats it. Most accuracy is won or lost here, before you pick a method.
2. What are the three choices for an anomaly?
Keep (it recurs), adjust (real but flag it), or strip (one-off or bad data).
3. A payday spike that returns every month should be…
It’s real seasonality — keep it. Removing recurring patterns is its own error.
4. When you strip a corrupt day, what should you do with the gap?
Replace, don’t delete — use neighbouring weeks or the day-type average so there’s no hole.
5. What’s the risk of over-cleaning?
Remove what won’t recur, but respect the noise that will — over-smoothing hides real variability.