Planning for async messaging — WhatsApp, in-app and social DMs

Intermediate level · ~6 minute read

Introduction

Live chat broke some of the assumptions voice planning relies on. Asynchronous messaging — WhatsApp, in-app messaging, social DMs — breaks the rest. A conversation can stay open for hours or days, a customer replies whenever they feel like it, and an agent juggles many threads at once with no clear “start” and “end.” The Erlang maths that sizes a voice queue simply doesn’t describe this. This article sets out how async actually behaves and how to plan for it.

An async conversation can stay open for days while consuming only a few minutes of agent effort, delivered in bursts. Counting “open conversations” as concurrent workload will massively over-staff; you have to plan the actual work and the response-time promise.

Why async is different

On a live channel, an agent is committed to a conversation for its whole duration; on async, they are not. The customer might reply in two minutes or two days, and in between the agent does other work. So the unit that matters isn’t the conversation length — it’s the total active handling time a conversation consumes, spread across however many turns it takes, plus the promise you’ve made about how quickly each reply comes back. Plan async like live chat and you’ll either drown agents or pay for idle capacity, because the concurrency assumptions don’t transfer.

The metric that replaces service level

For voice and live chat, the target is “answer quickly” — service level in seconds. For async, the equivalent promise is a response-time SLA: we’ll reply to each message within, say, 15 minutes, or one hour, or by end of next business day, depending on the channel and the brand. That target completely changes the staffing shape. A tight reply promise behaves almost like live chat and needs agents continuously available; a generous one (reply within a few hours) lets you smooth work across the day and even hold a backlog deliberately, more like email and back-office planning than like a live queue.

How to size it

Start from work, not conversations. Estimate the active handling minutes per conversation (sum of all the agent’s turns, not the elapsed time) and multiply by conversation volume to get a workload in agent-hours — the same offered-load logic as any other channel. Then layer the response-time SLA on top to decide how that workload must be distributed across the day: a strict SLA forces capacity to track the inbound-message curve closely; a loose one lets you flatten it. Because async messages arrive in bursty, non-Poisson patterns and conversations span intervals, simulation models this far better than a closed-form formula.

The traps

Three catch people out. First, counting open conversations as concurrent load — an agent can “hold” dozens of open async threads because most are idle at any moment, so concurrency figures from live chat are meaningless here. Second, blending async into the live-chat forecast — they have different arrival patterns, different handling profiles and different SLAs, and must be planned separately. Third, ignoring the customer-side delay: a conversation that’s open for a day isn’t a productivity problem, it’s the nature of the channel, so don’t manage agents on conversation duration or you’ll push them to close threads prematurely.

The takeaway

Async messaging is its own channel with its own physics. Plan it on active handling time and a response-time SLA, keep it separate from live chat, lean on simulation rather than Erlang, and resist the instinct to manage it like a live queue. Done well, its flexibility is a gift to a planner — a generous reply promise is one of the few levers that lets you genuinely smooth demand across the day.

Related: staffing for live chat and planning non-real-time channels.