How many contacts should you monitor? QA sampling that means something

Quality · ~7 minute read

“Five a month” is a habit, not a method

Walk into almost any contact centre and the quality programme runs on a round number: five contacts per agent per month, or four, or ten. Nobody can quite say where the number came from — it’s what the last QA lead did, and the one before that. The trouble is that a round habit produces a precise-looking score, and people then act on the score as if it were solid. An agent is coached, ranked, or passed over on the strength of a percentage built from a handful of contacts, and almost nobody asks the only question that matters: how much would that percentage move if we’d happened to pick five different contacts?

The answer, for small samples, is “a lot.” A score from five contacts has so much room to wobble that the gap between a “92%” agent and an “84%” agent can be pure luck of the draw. If you want a quality number that means something — one you can put in front of an agent, a union rep, or a regulator and defend — the sample has to be sized deliberately, to a stated confidence level and a stated margin of error. That isn’t statistical vanity; it’s the difference between measuring and guessing.

What a sample size actually buys you

Every score from a sample is an estimate of a true, unknowable rate — the quality the agent would score if you could evaluate everything they do. The sample gives you a best guess and a band of uncertainty around it. Two things set the width of that band. The confidence level is how sure you want to be that the true rate falls inside the band — 90%, 95%, 99%. The margin of error is how wide the band is — “85%, give or take ten points” is a margin of error of ten. A bigger sample buys you a narrower band, or more confidence, or both. The whole game is deciding how precise you need to be, and then buying exactly that much precision and no more.

The same two agents, small sample vs sized sample 5 contacts each Agent A Agent B bands overlap —gap is noise Sized sample bands separate —gap is real
With five contacts each, the uncertainty bands around two agents overlap so heavily that the “gap” between them is mostly noise. Size the sample and the bands tighten until a genuine difference — if there is one — becomes something you can act on.

What drives the number

Three inputs move the sample size, and the intuition behind each is worth carrying. Precision is expensive, fast. Halving the margin of error roughly quadruples the sample — the relationship is quadratic, not linear, which is why “just be a bit more accurate” costs far more analyst time than people expect. Confidence costs less than you’d think. Going from 90% to 95% confidence adds a chunk of sample; pushing on to 99% adds a lot more for a sureness most internal QA programmes don’t need. The pass rate matters, because uncertainty peaks in the middle. A process that’s either very good or very bad is easier to pin down than one hovering near 50%, so if you genuinely don’t know the rate, assume the worst case — 50% — and you’ll never under-sample.

There’s a fourth lever the textbooks gloss over: the size of the population itself. An agent only handles so many contacts in a month, and when the sample starts to become a meaningful fraction of everything they did, the finite-population correction pulls the required number down. For a low-volume specialist team, that correction is the difference between a feasible programme and an impossible one.

Per agent, or per process?

The most consequential design choice is the level you sample at. If the purpose of QA is to coach individuals, the sample has to be sized per agent — and that is genuinely demanding, because you need a defensible number for every person, every period. If the purpose is to monitor the process — is the operation as a whole getting better or worse, is this new journey landing — you can pool across agents and the total sample needed is dramatically smaller, because you’re estimating one rate instead of fifty. Many programmes quietly want the second but score as if they’re doing the first, then act on per-agent numbers that were never built to carry the weight. Decide which job QA is doing, and size for that job.

Sizing it without pretending to be a statistician

You don’t need to derive the formula to use it well. Pick the decision the score will drive, and let it set the precision: coaching conversations can live with a wider margin than pay or ranking, which had better be tight. Choose 95% confidence as a sensible default, set the margin of error you can defend, and read off the sample. Then do the part most programmes skip — multiply it out across the team and the period to see the analyst hours it costs, because that’s the number that decides whether the programme is real or aspirational. A QA plan that ignores its own capacity cost is the one that quietly slips back to “five a month” by March. Our QA monitoring sample-size calculator does the arithmetic and the capacity total for you; the scorecard and calibration tracker covers what happens once the contact is in front of an evaluator.

The reframe is simple: a quality score is a measurement, and measurements come with error bars. Size the sample so the error bars are small enough to support the decision you’re about to make — and no smaller, because precision you don’t need is just monitoring you can’t afford. That’s the whole discipline.

Pair this with designing a meaningful QA programme, calibration done well, and linking QA scores to the plan.