Live Chat QA Scorecard (2026)

1Tell us where to send it
Your name and work email — nothing more.
2Check your inbox
Your scorecard arrives in seconds, not days.
3Use it with your team
Editable and ready to share — make it your own.

A peek inside

See exactly what you're getting

Free Excel template

Spotsaas · 2026

Live Chat QA Scorecard

✓ Instructions

✓ Score a Chat

✓ Team Rollup

Get the scorecard →

What Is QA Scorecard?

The Live Chat QA Scorecard is a weighted, working spreadsheet for evaluating the quality of live-chat conversations. An evaluator reads a transcript, rates the agent on each criterion in a highlighted column on a 0-1 scale (1 = fully met, 0.5 = partially met, 0 = not met), and the workbook converts those ratings into a single 0-100 quality score. Each criterion carries a fixed weight, the weights sum to 100, and weighted points equal rating times weight — so a criterion rated 0.5 earns half its weight. Response-time discipline carries real weight in the model because, in chat, dead air is the fastest way to lose a conversation.

What makes this scorecard chat-specific rather than a generic QA form is its compliance auto-fail logic and its concurrency awareness. Two criteria — identity verification and consent/PII handling — are scored as compliant or breach, and if either is a breach the final score is capped at a failing 65 and the band shows a compliance fail no matter how strong the rest of the chat was. This mirrors how real QA programs treat compliance: a great conversation that mishandled a customer's data is still a failed conversation. The Score a Chat sheet also captures the chat header (agent, evaluator, chat ID) so every evaluation is traceable.

Beyond a single chat, a Team Rollup sheet turns individual scores into a calibration view. You enter each agent's average quality score, the number of chats evaluated this month, and their average concurrency, and the workbook computes a per-agent coaching status and a chat-weighted team average — fairer than a simple mean because each agent counts in proportion to how many chats were reviewed. The concurrency column is deliberately visible: the scorecard's guidance notes that the lowest scorers are often the ones running the highest concurrency, which is a tuning signal, not just an individual performance problem.

What QA Scorecard Is Used For

A QA scorecard exists to make 'good chat' an objective, coachable standard rather than a matter of opinion. By turning ratings into a weighted score with bands and a compliance gate, it gives managers a consistent way to evaluate and improve agents. Teams use it to:

✓ Score individual chats on weighted criteria to produce a single 0-100 quality number, with response-time discipline carrying real weight because dead air kills chats.
✓ Enforce a compliance auto-fail so any breach of identity verification or PII handling caps the score at a failing 65, regardless of how good the rest was.
✓ Sort chats into bands — 90-100 Excellent, 80-89 Meets standard, 70-79 Needs coaching, below 70 Improvement plan — so coaching is targeted at the right people.
✓ Roll a whole team into a calibration view with a chat-weighted team average that fairly reflects how many chats each agent was evaluated on.
✓ Surface the concurrency-versus-quality relationship by showing each agent's average concurrency next to their score, flagging where high load is dragging quality.
✓ Correlate QA scores with CSAT to separate a genuinely bad chat from a single bad day, and to find macros or topics that consistently underperform.
✓ Run a disciplined QA cadence — typically 4-8 chats per agent per month with monthly evaluator calibration — so scores stay consistent across reviewers.

Who Uses QA Scorecard

QA scoring connects evaluators, managers, and agents around one shared rubric. Each role uses the scorecard differently:

QA evaluators / analystsThey read transcripts and enter per-criterion ratings and compliance flags on the Score a Chat sheet, calibrating monthly against other evaluators on the same chats.

Support / CX managersThey use the Team Rollup to spot who needs coaching, track the chat-weighted team average, and tie QA results to the 30-60-90 ramp and CSAT.

Team leads / coachesThey turn band results and weighted criterion scores into specific feedback — for example, dead-air response-time misses or weak resolution language.

Compliance and risk ownersThey rely on the auto-fail gate to ensure that verification and PII-handling breaches surface immediately, no matter how strong the rest of the chat reads.

Workforce / operations leadsThey watch the concurrency column to tell whether a low score reflects the agent or an unrealistic chat load, and tune concurrency targets accordingly.

QA Scorecard: Context & Good to Know

Quality in chat is different from quality on the phone, and the scorecard is built around those differences. Response time carries explicit weight because dead air — long silent gaps while a visitor waits — is the single fastest way to lose a chat, even when the eventual answer is correct. Concurrency complicates everything: an agent running four simultaneous chats has less attention per conversation than one running two, so the scorecard surfaces average concurrency alongside the score to keep evaluations honest about the conditions the agent was working in.

The compliance auto-fail is the part teams most often get wrong without a tool like this. It's tempting to average compliance in with everything else, but that lets a polished, friendly chat that mishandled a customer's data still 'pass.' Real QA programs treat compliance as a gate, not a line item — so the scorecard caps any chat with a verification or PII breach at a failing 65 and marks a compliance fail in the band. This makes the consequence of a data-handling mistake unambiguous and keeps the program defensible.

A scorecard is only as good as its calibration. The guidance recommends evaluating 4-8 chats per agent per month and calibrating evaluators monthly on the same transcripts, because without calibration two reviewers will score the same chat differently and the whole program loses credibility. The chat-weighted team average in the Team Rollup reinforces fairness at the aggregate level — an agent reviewed on more chats counts proportionally more — so a single lucky or unlucky review doesn't distort the team picture.

The scorecard is most powerful when it doesn't stand alone. Correlating QA scores with CSAT separates a 'bad call' from a 'bad day' and reveals when a low score reflects a genuine skill gap versus a one-off. It also exposes systemic issues: if certain macros or topics consistently produce low QA scores, the fix is rewriting the macro or the workflow, not coaching the agent. Used this way — alongside the macro library, the tone guide, and CSAT data — the scorecard becomes the feedback engine that keeps a chat team improving rather than just being graded.

✓ Independent · vendors can't pay to rank

Built on verified data, not vendor spin

Every Spotsaas resource draws on the SpotScore — a blend of verified review ratings, review volume, and feature depth across 113 live chat software tools. Refreshed regularly; data as of June 2026.

FAQ

Questions, answered

What is a live chat QA scorecard?

It's a quality-assurance tool for evaluating chat conversations. An evaluator rates an agent on weighted criteria from a transcript, and the workbook converts those ratings into a single 0-100 quality score with a band and a pass/fail. It standardizes what 'good chat' means so coaching is consistent and defensible.

How does the weighted scoring work?

Each criterion has a fixed weight and the weights total 100. Ratings are entered on a 0-1 scale (1 = fully met, 0.5 = partial, 0 = not met), and weighted points equal rating times weight — so a criterion rated 0.5 earns half its weight. The sum of all weighted points is the chat's score out of 100 before the compliance gate is applied.

What is the compliance auto-fail?

Two criteria — identity verification and consent/PII handling — are scored as compliant or breach. If either is a breach, the final score is capped at a failing 65 and the band shows a compliance fail, no matter how strong the rest of the chat was. This mirrors how real QA programs treat data-handling mistakes as disqualifying.

What are the scoring bands?

90-100 is Excellent, 80-89 Meets standard, 70-79 Needs coaching, and below 70 triggers an improvement plan. The bands turn a raw number into an action — they tell a manager whether to recognize, calibrate, coach, or formally support an agent.

Why does response time carry so much weight in chat QA?

Because dead air is the fastest way to lose a chat. A visitor waiting in silence assumes they've been forgotten, even if the eventual answer is perfect. The scorecard gives response-time discipline real weight so agents are rewarded for narrating waits and keeping the conversation alive, not just for the final resolution.

How many chats should I evaluate per agent?

A mature program evaluates 4-8 chats per agent per month and calibrates evaluators monthly on the same transcripts. That volume is enough to spot patterns without overwhelming evaluators, and the monthly calibration keeps different reviewers scoring consistently.

How does the team rollup stay fair across agents?

It uses a chat-weighted team average, meaning each agent counts in proportion to how many chats were evaluated rather than as a simple equal-weight mean. An agent reviewed on more chats has more influence on the average, which prevents a single review from skewing the team picture.

Why does the scorecard show agent concurrency?

Because concurrency affects quality. An agent running four chats at once has less attention per conversation than one running two, and the scorecard's guidance notes that the lowest scorers are often the highest-concurrency agents. Seeing concurrency next to the score helps you tell an agent problem from a workload problem and tune concurrency targets.

How does QA relate to CSAT?

They measure different things — QA is the evaluator's view of process quality, CSAT is the customer's view of the outcome. Correlating them separates a 'bad call' from a 'bad day' and reveals whether a low score reflects a real skill gap. Where a specific macro or topic consistently underperforms on both, the fix is rewriting the workflow, not coaching the agent.

Can I use this scorecard with any chat platform?

Yes. The scorecard works from a transcript, so it's independent of whether your team runs Intercom, LiveChat, Olark, or another tool. You just need access to the chat transcript and the relevant header details (agent, chat ID) to evaluate, and the workbook handles the scoring, banding, and team rollup.

Keep exploring