Cleaning your history

Deep-dive lesson · about 10 minutes · short quiz at the end

ccPlanning academy · forecasting · deep dive

Cleaning your history

Garbage in, confident garbage out. The unglamorous step that decides everything.

The big idea

A forecast learns from the past you give it.

Every method — naive to neural net — assumes your history is a fair guide to the future. Leave a freak day in, and the model treats a one-off as a pattern to repeat.

Cleaning is where most of the accuracy is won or lost, long before you pick a method.

What you’re hunting

Two kinds of bad day.

Demand anomalies — a marketing email, a product recall, a storm, a competitor outage. Real contacts, but driven by something that won’t repeat on schedule.

Measurement anomalies — a telephony outage, a misrouted queue, a data-feed gap. The number is wrong, not just unusual.

Spotting them

Eyeball first, then test.

A chart catches most of it. Then flag points beyond a few standard deviations for a closer look.

The decision

Keep, adjust, or strip.

Three choices for every anomaly — and choosing well matters more than any clever code:

Keep

It’ll recur the same way

Adjust

Real, but needs flagging

Strip

A one-off or bad data

Keep

If it repeats on a schedule, it’s a pattern.

A payday spike, a monthly statement run, Black Friday — these come back. Don’t strip them; keep them and model them as recurring events. Removing real seasonality is its own forecasting error.

Adjust

Tag it so the method handles it.

A real but irregular driver — a one-week campaign, a snow week — can be flagged with an event marker rather than deleted, so the model knows that day was special and why. You keep the information without letting it pollute the baseline.

Strip & replace

Replace, don’t just delete.

For a true one-off or a corrupt reading, don’t leave a hole — most methods hate gaps. Replace it with a sensible estimate: the same interval from neighbouring weeks, or the day-type average. Now the series is continuous and honest.

The discipline

Document every change.

Keep a log: what you changed, when, and why. Cleaning is judgement, and judgement needs an audit trail — so next year’s planner (or you) knows the spike on the 14th was a recall, not demand.

Untracked manual edits are how forecasts quietly become fiction.

The balance

Clean enough — not too much.

Over-cleaning is a real risk. Smooth away every bump and you teach the model the world is calmer than it is, then it’s caught out by normal variation. Remove what won’t recur; respect the noise that will.

One worked call

The spike on the 14th

Volume triples for one afternoon. Was it a payday (keep — it recurs), a product recall (adjust — real but one-off, tag it), or a misrouted queue dumping another team’s calls into yours (strip — the number is simply wrong)?

Same spike, three different actions. The judgement call — not the technique — is the whole job.

The takeaway

Decide keep, adjust or strip — and write it down.

Find the anomalies, sort the recurring from the one-off, replace bad data rather than deleting it, and log every call. The method you pick matters far less than the history you feed it.

Now test yourself ↓

1 / 11

Slides done? Here’s the same idea in a bit more depth — the part worth keeping.

In depth: the step that decides everything before you pick a method

Every forecasting method, from naive to neural net, makes the same assumption: that your history is a fair guide to the future. That assumption is only as good as the data behind it. Leave a freak day in the record and the model treats a one-off as a pattern to repeat; strip out a real recurring event and you’ve deleted genuine seasonality. This is why cleaning the history is where most of the accuracy is quietly won or lost — long before anyone argues about which model to use.

Two kinds of bad day, three things to do

There are two anomalies to hunt. Demand anomalies are real contacts driven by something that won’t repeat on schedule — a marketing email, a recall, a storm, a competitor outage. Measurement anomalies are where the number itself is wrong: a telephony outage, a misrouted queue, a data-feed gap. For each one you have three choices. Keep it if it recurs the same way (payday spikes, statement runs, Black Friday — these are pattern, not noise). Adjust it if it’s real but irregular, flagging the day with an event marker so the method knows it was special and why. Strip and replace it if it’s a true one-off or corrupt reading — and replace rather than delete, because most methods hate gaps; the same interval from neighbouring weeks usually does the job.

Document it, and don’t over-clean

Cleaning is judgement, and judgement needs an audit trail. Keep a log of what you changed, when and why, so next year’s planner knows the spike on the 14th was a recall, not demand — untracked manual edits are how forecasts quietly turn into fiction. And resist the urge to over-clean: smooth away every bump and you teach the model the world is calmer than it really is, leaving it caught out by perfectly normal variation. Remove what won’t recur; respect the noise that will.

The principle to remember: find the anomalies, sort recurring from one-off, replace bad data rather than deleting it, and log every call. The method you pick matters far less than the history you feed it.

Cleaning your history