Operators and Escalation
Human operators monitor live interactions, take over when needed, and hand control back to the agent. Conference-based escalation with zero disruption.
Some situations require clinical judgment, emotional sensitivity, or authority that only a human can provide. The operator system handles the handoff between agent and human.
The core design principle: escalation should never disrupt the patient. When an operator joins a voice call, they enter the same conference as the caller and the agent. There is no hold music, no transfer, no dropped audio. The caller does not notice anything changed.
For text sessions, operators monitor in real time and can take over the conversation thread when needed.
When Escalation Happens
The platform escalates to operators automatically based on three categories of triggers:
Safety rules fire when the monitoring system detects content requiring human review. A patient mentioning self-harm, an adverse drug reaction pattern, or a compliance-relevant disclosure will trigger escalation immediately.
Patient request fires when the caller explicitly asks to speak with a person. The agent does not argue or deflect.
Agent uncertainty fires when the agent's confidence in its own understanding drops below a configured threshold. Rather than guessing, the agent hands off to someone who can help.
Operators receive a notification with full context: who the patient is, what the conversation has covered, and why the escalation was triggered.
Soft vs. Hard Escalation
Not all escalations require the agent to stop talking. The platform distinguishes two escalation modes:
Soft escalation - The agent triggers the escalation (notifying operators, adding the call to the queue) but continues the conversation. The caller stays engaged while waiting for an operator. This is the default for patient requests - the caller asked to speak with someone, but the agent can still be helpful while the operator connects.
Hard escalation - The agent suspends its output and waits for the operator. Used for safety-critical situations where the agent should not continue speaking (crisis indicators, clinical boundary violations). The agent finishes its current sentence before suspending.
Safety rules default to hard escalation. Patient requests and high-risk triggers default to soft escalation. These defaults can be overridden per monitor concept.
Takeover
The takeover interface is a single-screen interface built for rapid triage. Queue monitoring, call details, and operator controls are on one view.
Priority Queue
The left panel shows all active calls ranked by urgency. Each call displays the caller's name, current context graph state, wait time, turn count, risk score, and current emotion. Urgency levels (critical, high, medium, low) are computed from the call's risk score and whether an escalation is active. Each escalated call shows its escalation type - safety, high risk, caller distressed, or stuck in loop - so operators can instantly distinguish a safety crisis from a routine transfer request.
The queue updates in real time. New calls appear as they enter the system, and urgency indicators change as risk scores shift during the conversation. When no escalations are pending, the queue panel shows a system health dashboard with escalations handled today, operators online, and average wait time.
When a call escalates, operators receive browser push notifications (even when the console tab is in the background), an audio alert, and persistent in-app toast notifications that remain visible until dismissed.
Operator Modes
Operators work in one of two modes and can switch between them instantly:
Listen mode - The operator hears the full conversation but is muted at the telephony level. The caller does not know the operator is present. The AI agent continues handling the conversation normally. Used for quality monitoring, observing how the agent handles specific scenarios, and waiting for the right moment to intervene.
Takeover mode - The operator is unmuted and speaks directly with the caller. The agent's audio output is suppressed, but its processing loop continues running in the background. When the operator finishes and switches back to listen mode or leaves the call, the agent resumes immediately with full context of what happened during the takeover. There is no re-initialization or context loss.
During takeover, the operator's speech is captured through a dedicated per-participant STT stream and recorded as operator turns in the transcript. The complete call record includes everything the operator said, not just the agent and caller portions.
Connection Methods
Operators connect to calls through one of two methods:
Phone (PSTN) - The platform dials the operator's phone number. When the operator answers, they are added to the conference. This method works from any phone and requires no special software. Higher latency due to the PSTN round trip. Best for remote operators or situations where a desktop is not available.
Browser (WebRTC) - The operator connects through a web browser using the voice SDK. Audio travels directly over WebRTC, bypassing the phone network entirely. Lower latency than PSTN. Best for operators working at a desktop with a headset. The browser connection flow: request an access token via the API, register the operator for the call, and connect using the voice SDK with the provided token.
Only one operator can be active on a call at a time. If a second operator attempts to join the same call, they receive a conflict error. The same operator joining the same call again receives the cached response (the join is idempotent).
AI Briefing
When an operator selects a call from the queue, the system generates an AI briefing that summarizes the situation before the operator joins:
Situation summary - What the caller needs and where the conversation stands
Patient context - Relevant background from the world model
Risk assessment - Current risk level and contributing factors
Key issues - Specific problems identified during the call
Recommended actions - Suggested next steps for the operator
Call history - Prior interactions if applicable
The operator reads the briefing in seconds and joins with full context, rather than listening to minutes of conversation to piece together the situation.
Guidance Injection
Operators in listen mode can send text guidance to the agent without taking over the call. The guidance is injected into the active session and the agent processes it as an instructional event - interrupting its current speech to act on the guidance immediately.
This is useful when an operator sees the conversation going in the wrong direction and wants to steer the agent without the caller knowing a human intervened. For example, an operator monitoring a scheduling call could send "Ask for their insurance ID before confirming the appointment" and the agent would work that into its next response naturally.
Guidance messages are distinct from external events. External events carry factual information ("The appointment has been confirmed") and queue behind the agent's current speech. Guidance carries instructions ("Ask about their insurance") and interrupts because instructions are time-sensitive.
Both event types flow through the same injection system and work regardless of where the call is running in the platform.
Risk Scoring
Risk scoring and conversation monitoring initialize in the background when a voice session starts, running in parallel with the greeting. This deferred setup ensures that session startup is not delayed by monitoring initialization while guaranteeing that scoring is active before the first caller transcript needs evaluation.
The platform computes a composite risk score on every conversational turn, combining three signals:
Emotion
40%
Negative valence combined with high arousal, deteriorating emotional trend, barge-in frequency, and consecutive short responses from the caller
Loop detection
30%
How many times the agent has revisited the same state. Repeated state visits suggest the conversation is going in circles without progress.
Duration
30%
Time elapsed relative to the expected call length. Risk ramps after the expected duration is exceeded.
Risk Levels
The composite score maps to four levels:
Normal
Agent operates with full autonomy within its context graph.
Monitor
Internal flag raised. Agent behavior unchanged, but the call is flagged for closer post-call review.
Alert
Available operators receive a notification. Agent continues but with increased caution.
Escalate
Automatic escalation to an operator if one is available.
Per-State Threshold Overrides
Individual context graph states can override the default thresholds. A medication verification state might have a lower escalation threshold than a general scheduling state, because errors in that context carry higher clinical risk. Routine scheduling operates with standard thresholds. Clinical data collection tightens them.
Silence Management
If the caller stops speaking, the silence monitor manages this with exponential backoff:
After 10 seconds of silence, the agent asks "Are you still there?"
If no response, the next check-in waits 20 seconds
Third check-in at 40 seconds
After three unanswered check-ins, the agent ends the call with a message offering to have someone call back
Check-in utterances have a 5-second staleness window. If the caller speaks during the check-in filler, the filler is discarded and the conversation continues normally. This prevents awkward overlaps where the agent says "Are you still there?" just as the caller starts talking.
The silence monitor and risk scorer are independent systems. A call can escalate due to high risk while the caller is actively speaking, or the silence monitor can end a call that has normal risk scores but no active participant.
Speaker Resolution
With three participants in a conference (caller, agent, operator), the system resolves who is speaking at any given moment using a priority chain:
Operator in takeover mode - Highest priority. Agent audio is suppressed.
Caller - Barge-in detection applies. If the caller speaks during agent output, the agent stops.
Agent - Speaks when neither the operator nor the caller is active.
Humans always take precedence over the agent, and the caller always takes precedence over the agent's output.
Warm Hand-Off
When a patient asks to speak with a human and the workspace has warm transfer enabled, the agent initiates a three-phase conference handoff rather than a cold transfer. The patient never repeats themselves because the operator receives full context before taking over.
Phase 1: Normal
The agent and patient are in conversation as usual. The escalation is triggered by a patient request, safety rule, or agent uncertainty.
Phase 2: Briefing
The agent dials the operator into the existing conference. All three parties are connected - the caller hears the briefing alongside the operator. The agent delivers full conversation context: situation summary, patient background, risk assessment, and recommended actions. This transparency means the patient knows the operator is up to speed and does not need to repeat anything.
Barge-in is automatically disabled during the briefing phase so the operator's speech does not interrupt the agent's context transfer.
Phase 3: Connected
Once the operator is ready, the agent removes itself from the conference. The operator takes over with full context of everything that was discussed.
Warm transfer is the default for all forwarding configurations. Cold transfers (immediate forwarding without briefing) remain available for scenarios where speed is more important than context transfer.
Deferred Transfer
When the agent initiates a call transfer (for example, forwarding to a clinic's front desk), the transfer is deferred until the agent's goodbye message finishes playing. This prevents the caller from being redirected mid-sentence. If the caller speaks during the goodbye (barge-in), the transfer is cancelled and the conversation continues. If an operator joins the call during this window, the transfer is also cancelled.
Operator Dashboard
Operators register with a profile that includes their name, skills, connection method (phone or browser), and role. Their status is tracked in real time: offline, available, on-call, busy, or unavailable.
The operator dashboard provides:
Active call list - Currently escalated calls with context summaries
Escalation statistics - Volume and type of escalations over time
Performance metrics - Total escalations handled and average handle time per operator
Audit log - Complete history of operator actions (join, mode switch, leave) for compliance review
Escalation as Collaboration
The agent handles the routine part. The operator handles the part that needs a human. Both are participants in the same conversation, both reading from the same patient record, both contributing to the same outcome. When the operator finishes, the agent can resume with full context of what happened.
This changes the staffing model. Instead of staffing every line with a human who occasionally gets AI assistance, you staff a small operator team that handles the fraction of interactions requiring human judgment. The platform tracks escalation rates, operator response times, and handle times so you can right-size that team over time.
Developer Guide - For API endpoints and integration details, see the Operators reference in the developer guide.
Last updated
Was this helpful?

