The QM data model — design before dashboards

Quality · ~7 minute read

Once a QM programme moves beyond scoring calls into diagnostic, predictive and automated work, the data model underneath determines what’s easy, what’s hard, and what’s silently wrong. Most quality functions discover this too late — after the dashboard is built on a foundation that cannot answer the questions that matter.

Four fact types, four grains

A QM programme produces four distinct kinds of fact: scoring events (one row per call per rater, against a specific scorecard version), coaching events (one row per coaching conversation, anchored on the scoring it responds to), calibration events (one row per session, with linked rows per call discussed), and routing events (one row per insight routed to another function, with its downstream actions).

Each has its own grain. Mixing them into a single ‘QM events’ table — the pattern most operations drift into — means every aggregation becomes a special case, every analytical query needs hand-crafted filtering, and the linkages that make advanced QM possible quietly disappear.

The dimensions that make joins possible

Around the facts sit the conformed dimensions: agent and rater (modelled separately — a person can be both at different times, and both change over time, so version them), scorecard version and item, contact (channel, reason, segment, interval), customer where governance allows, and date and interval as in any contact-centre model.

The scorecard-version dimension is the one most often skipped, and the one that breaks the most analysis. ‘Empathy’ in May 2025 means something different from ‘empathy’ in May 2026 if the item was redefined at the annual review. Analysis that crosses a refresh without versioning treats different items as the same — and reports a trend that is actually a definition change.

The grain mistakes that break QM at scale

Five failures recur. The QM mart of doom — one wide table mixing all event types with nullable columns flagging which kind each row is. Mixed grain — per-item scores duplicated into per-call rows in a way that loses the link. No scorecard versioning. Orphaned coaching events — conversations that don’t reference the scoring they anchor on, making follow-up effectiveness untrackable. Weak contact dimensions — operational context coded inline rather than properly joinable.

Each looks like a shortcut at build time. Each turns a five-minute query into a custom rebuild later. One operation we describe in the book ran QM from a 60-column flat table for years; diagnostic queries took hours and coaching-effectiveness analysis was simply impossible because the linkage had never been preserved.

What a clean model buys you

With the four facts and conformed dimensions in place, the hard questions become joins. Inter-rater drift over time is scoring events joined to calibration events. Coaching effectiveness is commitments tracked to subsequent scoring on the same agent and item. Predictive QM gets its training features by joining scoring, coaching, contact context and outcome facts such as complaints and repeat contacts.

Speech-analytics integration depends on the same substrate: automated scores aligned to the same dimensions as human scores, so like-for-like comparison and calibration overlap are queries rather than projects. The model is the substrate of everything advanced QM does — designed cleanly, every analysis becomes possible; designed badly, every analysis is a custom rebuild.

Designing it deliberately

The method is short: identify the four fact types and give each its own table; declare and document the grain of each; build the conformed dimensions; version the scorecard; link the facts so coaching references scoring, routing references the originating insight, and calibration references both scoring and items.

Then validate against the questions you intend to ask. Can the model answer ‘which agents’ calibration-anchor scores diverged most from the QM team’? Can it track a routed insight to downstream scorecard movement? If the answer is a join, the model is right. If the answer is a spreadsheet exercise, fix the model before building anything on top of it.

The QM data model Four fact types ▸ Scoring events (call × rater) ▸ Coaching events (conversation) ▸ Calibration (session × call) ▸ Routing events (insight) ▸ Conformed dimensions around all ▸ Scorecard versioned (SCD2) ▸ Facts linked to each other Grain mistakes ▸ The QM mart of doom ▸ Mixed grain in one table ▸ No scorecard versioning ▸ Orphaned coaching events ▸ Weak contact dimension ▸ Context coded inline ▸ Linkage never preserved Designed cleanly, every analysis is a join · designed badly, every analysis is a rebuild

The closing principle

Design the model before the dashboard. The QM functions that do diagnostic, predictive and automated work well are the ones whose data layer was built for it — four fact types, declared grains, versioned scorecards, linked facts. Everything else in advanced quality stands on this.

See also