← ccPlanning Academy · Quality track
Sampling honestly
Slides done? Here’s the same idea in a bit more depth — the part worth keeping.
In depth: how much QA sampling is enough
The most common quiet failure in QA is sample size. Scoring two or three contacts per agent per month feels like measurement, but statistically it is noise: from such a tiny sample you cannot reliably distinguish a strong agent from a weak one, and the score lurches month to month on the luck of which contacts were drawn. Worse, those few are often the convenient ones — recent, short, cleanly recorded — which biases the picture towards the easy corner of the work. A representative sample spans contact types, times of day, lengths and outcomes, so the number reflects the job as it really is.
Two different questions
It helps to be explicit about what the sample is for. A per-agent sample exists to coach an individual, and even a small one has value as a conversation starter — provided nobody pretends it is a precise ranking. Understanding whether the operation is delivering quality, and feeding that into planning, is a different question that wants a larger pooled sample across everyone, which is far more stable than any individual’s handful. Automated scoring changes the economics again: when a machine can evaluate every contact, the sampling problem disappears for the dimensions it can judge, freeing human reviewers to concentrate on the contacts and qualities that need a person. Either way, the discipline is the same — sample enough, sample fairly, and know which question you are answering.
The principle to remember: a handful of cherry-picked calls is an anecdote, not a measurement. Sample enough to beat the noise, across the real spread of work, and be clear whether you’re coaching an agent or measuring the operation.
Quick quiz
Five questions. Pick an answer to each, then check your score.
1. What’s wrong with scoring two or three contacts per agent per month?
From a handful of calls the noise swamps the signal.
2. What does a representative sample span?
Sample the real spread, not the convenient corner.
3. Why does it matter whether you’re coaching an agent or measuring the operation?
A larger pooled sample answers the operation-level question reliably.
4. How does automated scoring change sampling?
Scoring everything sidesteps sampling for the machine-judgeable dimensions.
5. A score from a tiny, convenient sample is best described as…
Too small and too biased to be a real measurement.