← Back
Intercom
Intercom launches Monitors, a QA tool for AI support agents with custom scorecards
· releasefeatureplatform · intercom.com ↗

Addressing the AI Black Box

As AI support agents like Fin become more capable and handle higher volumes of conversations—Fin now resolves close to 2 million queries weekly with a 67% resolution rate across 8,000 customers—observability has become critical. Support leaders struggle to confidently answer basic questions about what their agents are doing: whether they're delivering good experiences, handling complex issues completely, and representing the brand consistently. Traditional QA approaches like CSAT scores and manual sampling don't scale effectively, leaving teams flying blind.

Two Core Components

Monitors consists of two integrated parts:

  • Monitors define which conversations get reviewed through targeted criteria—either flagging high-risk edge cases (like "customer showed signs of financial vulnerability") or creating consistent benchmarking samples. Teams can combine multiple filters based on customer data, channel, or Fin-specific metrics, moving beyond random sampling.

  • Custom Scorecards let teams define what "good" looks like for their specific business and turn that into measurable quality scores. Rather than applying generic rubrics, teams define criteria, set weights, and mark critical failures that automatically fail evaluations. Criteria can be scored by AI, humans, or both within the same scorecard.

Integrated Workflow

Flagged conversations flow into a Review Queue where they're automatically assigned to the right reviewer with the scorecard attached and review status tracked (Not Reviewed, Reviewed, Needs a Fix, Fix Complete). This replaces ad-hoc sampling and spreadsheet-driven QA with a system that scales with conversation volume.

Monitors completes Intercom's observability suite alongside Insights (which measures overall sentiment and topics) and Recommendations (which surfaces improvement opportunities). Together, they close the gap between traditional QA approaches and the scale of modern AI-driven support operations.