← ccPlanning Academy · Quality track

Sampling honestly

Micro lesson · about 5 minutes · short quiz at the end

ccPlanning academy · quality · micro

Sampling honestly

Two calls per agent per month tells you almost nothing — here’s why.

The problem

The standard sample is too small to mean anything.

Most programmes score a handful of contacts per agent per month. From two or three calls you cannot tell a genuinely 90%-quality agent from a 70% one — the noise swamps the signal. The score swings on which calls happened to be picked.

A number that small isn’t a measurement; it’s an anecdote with a percentage on it.

Representative, not convenient

Sample the spread, not the easy ones.

If you only score the calls that are easy to grab — recent, short, recorded cleanly — you measure a biased slice. Sample across contact types, times of day, lengths and outcomes, so the score reflects the real work, not the convenient corner of it.

Cherry-picked samples flatter or punish; neither tells the truth.

What you’re measuring

Are you scoring the agent, or the process?

A small per-agent sample is for coaching an individual. To understand whether the operation is delivering quality — and to feed the plan — you want a larger pooled sample across everyone, which is far more stable than any one agent’s handful.

Decide which question you’re answering before you set the sample.

The AI option

Scoring everything changes the maths.

Automated scoring can evaluate every contact, not a sample — which removes the sampling problem entirely for the things a machine can judge. That doesn’t make sampling obsolete; it shifts human effort to the contacts and dimensions where judgement is needed (the subject of a later lesson).

The takeaway

Sample enough, across the real spread.

Two calls a month is an anecdote. Sample enough to separate signal from noise, across contact types and times, and be clear whether you’re coaching an agent or measuring the operation.

Now test yourself ↓

1 / 6

Slides done? Here’s the same idea in a bit more depth — the part worth keeping.

In depth: how much QA sampling is enough

The most common quiet failure in QA is sample size. Scoring two or three contacts per agent per month feels like measurement, but statistically it is noise: from such a tiny sample you cannot reliably distinguish a strong agent from a weak one, and the score lurches month to month on the luck of which contacts were drawn. Worse, those few are often the convenient ones — recent, short, cleanly recorded — which biases the picture towards the easy corner of the work. A representative sample spans contact types, times of day, lengths and outcomes, so the number reflects the job as it really is.

Two different questions

It helps to be explicit about what the sample is for. A per-agent sample exists to coach an individual, and even a small one has value as a conversation starter — provided nobody pretends it is a precise ranking. Understanding whether the operation is delivering quality, and feeding that into planning, is a different question that wants a larger pooled sample across everyone, which is far more stable than any individual’s handful. Automated scoring changes the economics again: when a machine can evaluate every contact, the sampling problem disappears for the dimensions it can judge, freeing human reviewers to concentrate on the contacts and qualities that need a person. Either way, the discipline is the same — sample enough, sample fairly, and know which question you are answering.

The principle to remember: a handful of cherry-picked calls is an anecdote, not a measurement. Sample enough to beat the noise, across the real spread of work, and be clear whether you’re coaching an agent or measuring the operation.

Quick quiz

Five questions. Pick an answer to each, then check your score.

1. What’s wrong with scoring two or three contacts per agent per month?

From a handful of calls the noise swamps the signal.

2. What does a representative sample span?

Sample the real spread, not the convenient corner.

3. Why does it matter whether you’re coaching an agent or measuring the operation?

A larger pooled sample answers the operation-level question reliably.

4. How does automated scoring change sampling?

Scoring everything sidesteps sampling for the machine-judgeable dimensions.

5. A score from a tiny, convenient sample is best described as…

Too small and too biased to be a real measurement.