Monitoring and Alerting
Semantic rules that detect safety-relevant conversation patterns using embedding similarity rather than keyword matching.
The monitoring system evaluates conversations against semantic rules in real time. Rather than scanning for exact keywords, it uses embedding similarity to detect when a conversation is approaching topics that require attention. This means the system catches paraphrases, indirect references, and contextual mentions that keyword matching would miss.
Monitor Concepts
A monitor concept is a semantic rule that defines a topic or behavior pattern the system should watch for. Each concept has:
A description of what it detects (e.g., "caller is asking for medical advice about dosing")
A detection threshold that controls sensitivity (how similar the conversation content must be to the concept's embedding to trigger a match)
An action that specifies what happens when the concept fires (alert, escalate, log, or a custom response)
Monitor concepts are configured per workspace. A scheduling-focused deployment might have concepts for "caller requesting clinical advice" (escalate) and "caller expressing frustration with wait times" (alert). A clinical follow-up deployment might add concepts for "patient reporting worsening symptoms" (escalate) and "patient mentioning medication side effects" (flag for review).
Setting Thresholds
Detection thresholds control the trade-off between catching real signals and generating false positives:
Too low (e.g., 0.70) - The concept fires frequently, including on conversations that are only tangentially related. The review queue fills with false positives, and operators learn to ignore alerts.
Too high (e.g., 0.95) - The concept only fires on near-exact matches to its description. Indirect references and paraphrases are missed.
Recommended starting point - 0.80 to 0.85. Tune based on review queue data after the first week of production calls.
Thresholds should be reviewed regularly. As call patterns change (new services, seasonal variations, updated workflows), the right threshold may shift.
Dual-Layer Detection Architecture
The monitoring system uses two detection layers working in sequence:
Layer 1: Embedding similarity. Every utterance is compared against the workspace's configured monitor concepts using vector similarity. This is fast and catches most cases, but it can produce false positives on semantically similar but contextually different statements.
Layer 2: LLM judge. When embedding similarity exceeds the detection threshold, an LLM judge evaluates the flagged content in the full conversation context. The judge determines whether the flag represents a genuine safety concern or a false positive. Only flags confirmed by the judge trigger the configured action (alert, escalate, log).
This two-layer approach keeps detection sensitive (low false negatives from the embedding layer) while keeping the review queue clean (low false positives from the LLM judge layer).
Regulation Templates
Regulation templates are pre-built sets of monitor concepts designed for specific compliance requirements. Rather than building safety rules from scratch, you apply a template and get a baseline set of concepts that you can then customize.
For healthcare deployments, templates cover areas such as:
HIPAA-related patterns - Detecting when conversations involve PHI in contexts that need extra care
Clinical boundary monitoring - Identifying when the agent is being asked to provide advice beyond its configured scope
Crisis detection - Recognizing indicators of mental health crisis, self-harm, or emergency medical situations
Medication safety - Flagging conversations about dosing, drug interactions, or adverse reactions
Templates are composable. You can apply multiple templates to a single workspace. When a template is applied, it creates individual monitor concepts that you can modify, disable, or extend without affecting the template itself.
The Review Queue
When a monitor concept fires during a conversation, the event is logged and, depending on the concept's configured action, may be added to the review queue. The review queue is where operators and supervisors examine flagged interactions.
Each item in the review queue includes:
The conversation transcript with the triggering segment highlighted
Which monitor concept fired and its similarity score
The caller's emotional state at the time of the trigger
The agent's response to the triggering content
The review queue serves two purposes: immediate response (was the escalation handled correctly?) and long-term tuning (should this concept's threshold be adjusted?).
How Monitors Trigger Operator Escalation
When a monitor concept with an escalation action fires during a live call, the following happens:
The platform checks whether an operator is available
If an operator is available, they receive a notification with the call context and the reason for escalation
The operator can join the call in listen mode to assess the situation
If the operator determines intervention is needed, they switch to takeover mode
If no operator is available when an escalation triggers, the event is logged, the agent adjusts its behavior (increased caution, more frequent confirmation-seeking), and the conversation is flagged for post-call review.
The monitoring system does not replace operator judgment. It surfaces situations that may need attention. The operator decides what to do about them.
Developer Guide - For API endpoints, SDK examples, and integration details, see the Safety & Monitoring in the developer guide.
Last updated
Was this helpful?

