Composite metrics that hide the truth

Leadership · Workforce economics · ~7 minute read

The metric that feels rigorous and isn’t

Composite metrics — quality scores built from twelve weighted sub-metrics, balanced scorecards, weighted customer experience indices, “overall performance” numbers — look scientific and feel rigorous. They’re also the most common single mistake in contact centre MI design. When the composite moves, nobody can tell which of the components is responsible. The conversation drifts into defending the score rather than improving the operation. The score becomes a target, then a ceiling, then a distraction. This article walks through where composites destroy signal, the two questions every composite should answer before it earns its place, the four traps composites usually fall into, when a composite is genuinely the right call, and how to retrofit decomposition into an existing composite without ripping the whole thing out.

Where composites destroy signal

A composite metric averages several sub-metrics into a single number. It works when the underlying sub-metrics move together; it fails when they don’t. In contact centres they rarely move together. A “quality score” built from accuracy, soft skills, compliance, knowledge, and process can show a stable headline while accuracy collapses and compliance compensates. The composite stays at 87%; the operation is in serious trouble; nobody knows because the composite hasn’t moved.

The deeper issue is that composites turn diagnosis into archaeology. By the time somebody notices the composite has drifted, the planning team has to dig back through weeks of sub-component data to find what changed. The same time spent looking at the sub-metrics directly would have caught the issue weeks earlier.

The most dangerous state in MI: a healthy composite hiding a component collapse.

The two questions every composite must answer

Before adding a composite metric to the pack, ask two questions. Do the components move together? If yes, the composite is a useful summary. If no, the composite hides information. Most contact centre composites fail this test — accuracy doesn’t move with compliance, AHT doesn’t move with CSAT, adherence doesn’t move with QA.

If the composite moves, can the audience act? If the audience would still need to look at the sub-components to know what to do, the composite is decoration; the sub-components are the actual MI. Keep the sub-components. Drop the composite — or accept that it’s a marketing number, not an operating one.

The four traps composites fall into

The weight-shuffle trap. Composites are built by weighting their components. The weights are usually picked once, by a stakeholder, with limited rigour. Six months later, the operation has changed, but the weights haven’t. The score becomes increasingly disconnected from reality, and the people who set the weights have moved on.

The Goodhart trap. Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. Composites are especially vulnerable because gaming them is easy — improve any component, the score improves, the underlying operation hasn’t necessarily improved. Composites that become targets quickly stop reflecting the truth.

The diagnostic-loss trap. Composites smooth over the variance that diagnosis depends on. A stable composite with one component collapsing is the most dangerous state in MI — the dashboard is green; the operation is on fire; the planning team will be blamed when somebody finally notices.

The defensibility trap. Composites built by committee are politically robust — every stakeholder gets a component, every weight is a compromise — and operationally useless. The composite survives because nobody wants to fight to remove anyone’s favourite metric, not because it tells anyone anything useful.

When composites are genuinely the right call

Composites aren’t always wrong. Three situations justify them. External communication. A board or an external regulator wants a single number; a composite is the only way to give them one without throwing twenty metrics at them. The composite is for communication, not operating, and the operating team uses the components.

Trend smoothing. If the components are noisy individually and the composite is the genuinely important quantity (e.g. a customer-experience index calibrated against actual customer behaviour), the composite is the legitimate measure. The components are diagnostic.

Goal-setting alignment. Sometimes the composite is a useful North Star for the team — a single direction that aligns disparate sub-targets. This works when the composite is accompanied by visible accountability for the sub-components, so nobody can hit the composite while a component fails.

How to design a composite that works

If the composite is genuinely warranted, five design rules raise the odds it stays useful.

Always publish the decomposition. The composite never appears without its components. Side by side, every time. The audience that wants the headline gets it; the audience that wants the diagnosis gets that too.

Set thresholds per component, not just on the composite. Each component has its own target, its own trigger condition, its own owner. The composite trips when any component breaches, not just when the weighted total crosses a line.

Review the weights annually. Weights drift in relevance as the operation changes. The annual review forces the conversation about what the composite is for and whether the weights still serve it.

Use simple weights. Equal weights, or simple round numbers (20/30/30/20). The illusion of precision in “0.347 on quality, 0.218 on adherence” is worse than honest equal weighting.

Audit for gaming. Once a year, ask the question: how could a team hit the composite while not improving the operation? If the answer is “easily,” the composite needs redesigning or retiring.

Retrofitting an existing composite

Most operations already have a composite metric they regret but can’t easily remove. Three retrofits help without requiring a full reset.

Add the decomposition to the dashboard. The composite stays, but the components appear next to it. The audience gets used to looking at both. Over time, the components become the operative metric and the composite quietly fades.

Add the threshold rule. Even if the composite stays, individual components get their own thresholds. The dashboard goes red when any component breaches, not just when the weighted total does. This catches the most dangerous state — composite green, component collapsing — before it does damage.

Time-box the composite. Announce that the composite will be reviewed at year-end with a real option to remove it. The conversation about what it’s actually for, started in advance, is usually more productive than a retrospective post-mortem after it fails.

Conclusion

Composite metrics look scientific. Most of them aren’t. They average information away, hide diagnosis, invite gaming, and become political compromises that survive long after their usefulness has gone. The two-question test — do the components move together, and can the audience act on the composite alone — kills most composites before they make it onto the pack. The few that survive deserve their place, but only with their decomposition visible, their components individually targeted, and their weights revisited annually. The operations that take this seriously discover that what looked like rigour was usually compression — and that the components on their own tell a better story than the composite ever did.

Pair this with designing meaningful MI in contact centres, leading vs lagging indicators, and building causal chains into your MI.