What it is
The Live Chat QA Scorecard is a weighted, working spreadsheet for evaluating the quality of live-chat conversations. An evaluator reads a transcript, rates the agent on each criterion in a highlighted column on a 0-1 scale (1 = fully met, 0.5 = partially met, 0 = not met), and the workbook converts those ratings into a single 0-100 quality score. Each criterion carries a fixed weight, the weights sum to 100, and weighted points equal rating times weight — so a criterion rated 0.5 earns half its weight. Response-time discipline carries real weight in the model because, in chat, dead air is the fastest way to lose a conversation.
What makes this scorecard chat-specific rather than a generic QA form is its compliance auto-fail logic and its concurrency awareness. Two criteria — identity verification and consent/PII handling — are scored as compliant or breach, and if either is a breach the final score is capped at a failing 65 and the band shows a compliance fail no matter how strong the rest of the chat was. This mirrors how real QA programs treat compliance: a great conversation that mishandled a customer's data is still a failed conversation. The Score a Chat sheet also captures the chat header (agent, evaluator, chat ID) so every evaluation is traceable.
Beyond a single chat, a Team Rollup sheet turns individual scores into a calibration view. You enter each agent's average quality score, the number of chats evaluated this month, and their average concurrency, and the workbook computes a per-agent coaching status and a chat-weighted team average — fairer than a simple mean because each agent counts in proportion to how many chats were reviewed. The concurrency column is deliberately visible: the scorecard's guidance notes that the lowest scorers are often the ones running the highest concurrency, which is a tuning signal, not just an individual performance problem.
What it's used for
A QA scorecard exists to make 'good chat' an objective, coachable standard rather than a matter of opinion. By turning ratings into a weighted score with bands and a compliance gate, it gives managers a consistent way to evaluate and improve agents. Teams use it to:
- ✓ Score individual chats on weighted criteria to produce a single 0-100 quality number, with response-time discipline carrying real weight because dead air kills chats.
- ✓ Enforce a compliance auto-fail so any breach of identity verification or PII handling caps the score at a failing 65, regardless of how good the rest was.
- ✓ Sort chats into bands — 90-100 Excellent, 80-89 Meets standard, 70-79 Needs coaching, below 70 Improvement plan — so coaching is targeted at the right people.
- ✓ Roll a whole team into a calibration view with a chat-weighted team average that fairly reflects how many chats each agent was evaluated on.
- ✓ Surface the concurrency-versus-quality relationship by showing each agent's average concurrency next to their score, flagging where high load is dragging quality.
- ✓ Correlate QA scores with CSAT to separate a genuinely bad chat from a single bad day, and to find macros or topics that consistently underperform.
- ✓ Run a disciplined QA cadence — typically 4-8 chats per agent per month with monthly evaluator calibration — so scores stay consistent across reviewers.
Who uses it
QA scoring connects evaluators, managers, and agents around one shared rubric. Each role uses the scorecard differently:
Context & good to know
Quality in chat is different from quality on the phone, and the scorecard is built around those differences. Response time carries explicit weight because dead air — long silent gaps while a visitor waits — is the single fastest way to lose a chat, even when the eventual answer is correct. Concurrency complicates everything: an agent running four simultaneous chats has less attention per conversation than one running two, so the scorecard surfaces average concurrency alongside the score to keep evaluations honest about the conditions the agent was working in.
The compliance auto-fail is the part teams most often get wrong without a tool like this. It's tempting to average compliance in with everything else, but that lets a polished, friendly chat that mishandled a customer's data still 'pass.' Real QA programs treat compliance as a gate, not a line item — so the scorecard caps any chat with a verification or PII breach at a failing 65 and marks a compliance fail in the band. This makes the consequence of a data-handling mistake unambiguous and keeps the program defensible.
A scorecard is only as good as its calibration. The guidance recommends evaluating 4-8 chats per agent per month and calibrating evaluators monthly on the same transcripts, because without calibration two reviewers will score the same chat differently and the whole program loses credibility. The chat-weighted team average in the Team Rollup reinforces fairness at the aggregate level — an agent reviewed on more chats counts proportionally more — so a single lucky or unlucky review doesn't distort the team picture.
The scorecard is most powerful when it doesn't stand alone. Correlating QA scores with CSAT separates a 'bad call' from a 'bad day' and reveals when a low score reflects a genuine skill gap versus a one-off. It also exposes systemic issues: if certain macros or topics consistently produce low QA scores, the fix is rewriting the macro or the workflow, not coaching the agent. Used this way — alongside the macro library, the tone guide, and CSAT data — the scorecard becomes the feedback engine that keeps a chat team improving rather than just being graded.