# Operators and Escalation

Some situations require clinical judgment, emotional sensitivity, or authority that only a human can provide. The operator system handles the handoff between agent and human.

The core design principle: escalation should never disrupt the patient. When an operator joins a voice call, they enter the same conference as the caller and the agent. There is no hold music, no transfer, no dropped audio. The caller does not notice anything changed.

For text sessions, operators monitor in real time and can take over the conversation thread when needed.

{% @mermaid/diagram content="sequenceDiagram
participant Caller
participant Agent
participant Operator

```
Note over Caller,Agent: Normal conversation
Caller->>Agent: Speech audio
Agent->>Caller: Response audio

Note over Agent,Operator: Escalation triggered
Agent->>Operator: AI briefing + full context
Operator->>Agent: Joins conference (listen mode)

Note over Caller,Operator: Operator listening
Caller->>Agent: Speech audio
Agent->>Caller: Response audio
Operator-->>Agent: Inject guidance (caller unaware)

Note over Caller,Operator: Operator takes over
Operator->>Caller: Speaks directly
Note over Agent: Audio suppressed, context preserved

Note over Caller,Agent: Operator leaves
Agent->>Caller: Resumes with full context" %}
```

## When Escalation Happens

The platform escalates to operators automatically based on three categories of triggers:

* **Safety rules** fire when the monitoring system detects content requiring human review. A patient mentioning self-harm, an adverse drug reaction pattern, or a compliance-relevant disclosure will trigger escalation immediately.
* **Patient request** fires when the caller explicitly asks to speak with a person. The agent does not argue or deflect.
* **Agent uncertainty** fires when the agent's confidence in its own understanding drops below a configured threshold. Rather than guessing, the agent hands off to someone who can help.

Operators receive a notification with full context: who the patient is, what the conversation has covered, and why the escalation was triggered.

### Soft vs. Hard Escalation

Not all escalations require the agent to stop talking. The platform distinguishes two escalation modes:

* **Soft escalation** - The agent triggers the escalation (notifying operators, adding the call to the queue) but continues the conversation. The caller stays engaged while waiting for an operator. This is the default for patient requests - the caller asked to speak with someone, but the agent can still be helpful while the operator connects.
* **Hard escalation** - The agent suspends its output and waits for the operator. Used for safety-critical situations where the agent should not continue speaking (crisis indicators, clinical boundary violations). The agent finishes its current sentence before suspending.

Safety rules default to hard escalation. Patient requests and high-risk triggers default to soft escalation. These defaults can be overridden per monitor concept.

## Takeover

The takeover interface is a single-screen interface built for rapid triage. Queue monitoring, call details, and operator controls are on one view.

### Priority Queue

The left panel shows all active calls ranked by urgency. Each call displays the caller's name, current context graph state, wait time, turn count, risk score, and current emotion. Urgency levels (critical, high, medium, low) are computed from the call's risk score and whether an escalation is active. Each escalated call shows its escalation type - safety, high risk, caller distressed, or stuck in loop - so operators can instantly distinguish a safety crisis from a routine transfer request.

The queue updates in real time. New calls appear as they enter the system, and urgency indicators change as risk scores shift during the conversation. When no escalations are pending, the queue panel shows a system health dashboard with escalations handled today, operators online, and average wait time.

When a call escalates, operators receive browser push notifications (even when the console tab is in the background), an audio alert, and persistent in-app toast notifications that remain visible until dismissed.

### Operator Modes

Operators work in one of two modes and can switch between them instantly:

* **Listen mode** - The operator hears the full conversation but is muted at the telephony level. The caller does not know the operator is present. The AI agent continues handling the conversation normally. Used for quality monitoring, observing how the agent handles specific scenarios, and waiting for the right moment to intervene.
* **Takeover mode** - The operator is unmuted and speaks directly with the caller. The agent's audio output is suppressed, but its processing loop continues running in the background. When the operator finishes and switches back to listen mode or leaves the call, the agent resumes immediately with full context of what happened during the takeover. There is no re-initialization or context loss.

During takeover, the operator's speech is captured through a dedicated per-participant STT stream and recorded as operator turns in the transcript. The complete call record includes everything the operator said, not just the agent and caller portions.

### Connection Methods

Operators connect to calls through one of two methods:

* **Phone (PSTN)** - The platform dials the operator's phone number. When the operator answers, they are added to the conference. This method works from any phone and requires no special software. Higher latency due to the PSTN round trip. Best for remote operators or situations where a desktop is not available.
* **Browser (WebRTC)** - The operator connects through a web browser using the voice SDK. Audio travels directly over WebRTC, bypassing the phone network entirely. Lower latency than PSTN. Best for operators working at a desktop with a headset. The browser connection flow: request an access token via the API, register the operator for the call, and connect using the voice SDK with the provided token.

Only one operator can be active on a call at a time. If a second operator attempts to join the same call, they receive a conflict error. The same operator joining the same call again receives the cached response (the join is idempotent).

### AI Briefing

When an operator selects a call from the queue, the system generates an AI briefing that summarizes the situation before the operator joins:

* **Situation summary** - What the caller needs and where the conversation stands
* **Patient context** - Relevant background from the world model
* **Risk assessment** - Current risk level and contributing factors
* **Key issues** - Specific problems identified during the call
* **Recommended actions** - Suggested next steps for the operator
* **Call history** - Prior interactions if applicable

The operator reads the briefing in seconds and joins with full context, rather than listening to minutes of conversation to piece together the situation.

### Guidance Injection

Operators in listen mode can send text guidance to the agent without taking over the call. The guidance is injected into the active session and the agent processes it as an instructional event - interrupting its current speech to act on the guidance immediately.

This is useful when an operator sees the conversation going in the wrong direction and wants to steer the agent without the caller knowing a human intervened. For example, an operator monitoring a scheduling call could send "Ask for their insurance ID before confirming the appointment" and the agent would work that into its next response naturally.

Guidance messages are distinct from external events. External events carry factual information ("The appointment has been confirmed") and queue behind the agent's current speech. Guidance carries instructions ("Ask about their insurance") and interrupts because instructions are time-sensitive.

Both event types flow through the same injection system and work regardless of where the call is running in the platform.

## Risk Scoring

Risk scoring and conversation monitoring initialize in the background when a voice session starts, running in parallel with the greeting. This deferred setup ensures that session startup is not delayed by monitoring initialization while guaranteeing that scoring is active before the first caller transcript needs evaluation.

###

The platform computes a composite risk score on every conversational turn, combining three signals:

| Signal             | Weight | What It Measures                                                                                                                                |
| ------------------ | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| **Emotion**        | 40%    | Negative valence combined with high arousal, deteriorating emotional trend, barge-in frequency, and consecutive short responses from the caller |
| **Loop detection** | 30%    | How many times the agent has revisited the same state. Repeated state visits suggest the conversation is going in circles without progress.     |
| **Duration**       | 30%    | Time elapsed relative to the expected call length. Risk ramps after the expected duration is exceeded.                                          |

### Risk Levels

The composite score maps to four levels:

| Level        | What Happens                                                                                         |
| ------------ | ---------------------------------------------------------------------------------------------------- |
| **Normal**   | Agent operates with full autonomy within its context graph.                                          |
| **Monitor**  | Internal flag raised. Agent behavior unchanged, but the call is flagged for closer post-call review. |
| **Alert**    | Available operators receive a notification. Agent continues but with increased caution.              |
| **Escalate** | Automatic escalation to an operator if one is available.                                             |

### Per-State Threshold Overrides

Individual context graph states can override the default thresholds. A medication verification state might have a lower escalation threshold than a general scheduling state, because errors in that context carry higher clinical risk. Routine scheduling operates with standard thresholds. Clinical data collection tightens them.

### Silence Management

If the caller stops speaking, the silence monitor manages this with exponential backoff:

1. After **10 seconds** of silence, the agent asks "Are you still there?"
2. If no response, the next check-in waits **20 seconds**
3. Third check-in at **40 seconds**
4. After three unanswered check-ins, the agent ends the call with a message offering to have someone call back

Check-in utterances have a 5-second staleness window. If the caller speaks during the check-in filler, the filler is discarded and the conversation continues normally. This prevents awkward overlaps where the agent says "Are you still there?" just as the caller starts talking.

The silence monitor and risk scorer are independent systems. A call can escalate due to high risk while the caller is actively speaking, or the silence monitor can end a call that has normal risk scores but no active participant.

## Speaker Resolution

With three participants in a conference (caller, agent, operator), the system resolves who is speaking at any given moment using a priority chain:

1. **Operator in takeover mode** - Highest priority. Agent audio is suppressed.
2. **Caller** - Barge-in detection applies. If the caller speaks during agent output, the agent stops.
3. **Agent** - Speaks when neither the operator nor the caller is active.

Humans always take precedence over the agent, and the caller always takes precedence over the agent's output.

## Warm Hand-Off

When a patient asks to speak with a human and the workspace has warm transfer enabled, the agent initiates a three-phase conference handoff rather than a cold transfer. The patient never repeats themselves because the operator receives full context before taking over.

### Phase 1: Normal

The agent and patient are in conversation as usual. The escalation is triggered by a patient request, safety rule, or agent uncertainty.

### Phase 2: Briefing

The agent dials the operator into the existing conference. All three parties are connected - the caller hears the briefing alongside the operator. The agent delivers full conversation context: situation summary, patient background, risk assessment, and recommended actions. This transparency means the patient knows the operator is up to speed and does not need to repeat anything.

Barge-in is automatically disabled during the briefing phase so the operator's speech does not interrupt the agent's context transfer.

### Phase 3: Connected

Once the operator is ready, the agent removes itself from the conference. The operator takes over with full context of everything that was discussed.

{% @mermaid/diagram content="sequenceDiagram
participant Patient
participant Agent
participant Operator

```
Note over Patient,Agent: Phase 1: Normal
Patient->>Agent: "Can I speak to someone?"

Note over Patient,Operator: Phase 2: Briefing
Agent->>Operator: Dials in, all three connected
Agent->>Patient: AI briefing (patient hears)
Agent->>Operator: AI briefing (operator hears)

Note over Patient,Operator: Phase 3: Connected
Note over Agent: Removed from call
Patient->>Operator: Direct conversation" %}
```

Warm transfer is the default for all forwarding configurations. Cold transfers (immediate forwarding without briefing) remain available for scenarios where speed is more important than context transfer.

## Deferred Transfer

When the agent initiates a call transfer (for example, forwarding to a clinic's front desk), the transfer is deferred until the agent's goodbye message finishes playing. This prevents the caller from being redirected mid-sentence. If the caller speaks during the goodbye (barge-in), the transfer is cancelled and the conversation continues. If an operator joins the call during this window, the transfer is also cancelled.

## Operator Dashboard

Operators register with a profile that includes their name, skills, connection method (phone or browser), and role. Their status is tracked in real time: offline, available, on-call, busy, or unavailable.

The operator dashboard provides:

* **Active call list** - Currently escalated calls with context summaries
* **Escalation statistics** - Volume and type of escalations over time
* **Performance metrics** - Total escalations handled and average handle time per operator
* **Audit log** - Complete history of operator actions (join, mode switch, leave) for compliance review

## Escalation as Collaboration

The agent handles the routine part. The operator handles the part that needs a human. Both are participants in the same conversation, both reading from the same patient record, both contributing to the same outcome. When the operator finishes, the agent can resume with full context of what happened.

This changes the staffing model. Instead of staffing every line with a human who occasionally gets AI assistance, you staff a small operator team that handles the fraction of interactions requiring human judgment. The platform tracks escalation rates, operator response times, and handle times so you can right-size that team over time.

{% hint style="info" %}
**Developer Guide** - For API endpoints and integration details, see the [Operators](https://docs.amigo.ai/developer-guide/platform-api/platform-api/operators) reference in the developer guide.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/channels/operators.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.