# Operators

Operators are human agents who can monitor, join, and take over live voice calls. The operator system provides escalation management, real-time safety monitoring, call transcripts, and a full audit log - all built on the platform's [world model](https://docs.amigo.ai/developer-guide/platform-api/platform-api/data-world-model) with dual-entity event writes for independent state tracking.

## Operator Lifecycle

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F"}}}%%
stateDiagram-v2
\[*] --> Available: Register
Available --> Listening: Join Call (listen)
Available --> OnCall: Join Call (takeover)
Listening --> OnCall: Switch to Takeover
OnCall --> Listening: Switch to Listen
OnCall --> Available: Leave Call
Listening --> Available: Leave Call
Available --> \[*]: Deregister" %}

* **Listen mode**: Operator monitors the call silently. The AI agent continues handling the conversation. Operator is muted at the telephony level.
* **Takeover mode**: AI agent's audio output is muted (speaker suppressed); operator speaks directly with the caller. The agent's processing loop continues running - when the operator leaves or switches back to listen, the agent resumes immediately with no re-initialization.

**Mode switching**: Toggling between listen and takeover is instantaneous - it modifies the telephony mute state and the agent speaker mute, not the underlying session. Operator transcripts during takeover are captured as `human_segment` turns via a dedicated per-participant STT stream.

## Connection Methods

Operators can join calls via two methods, each using the [conference architecture](https://docs.amigo.ai/developer-guide/platform-api/voice-agent#conference-architecture):

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F", "clusterBkg": "#F1EAE7", "clusterBorder": "#D7D2D0"}}}%%
flowchart TB
subgraph Phone\["Phone Connection"]
P1\["POST /operator-join\n(phone number)"] --> P2\["Platform dials\noperator's phone"]
P2 --> P3\["Operator answers\n→ joins conference"]
P3 --> P4\["Dedicated STT stream\nfor operator audio"]
end

```
subgraph Browser["Browser WebRTC Connection"]
    B1["POST /operator-access-token"] --> B2["Returns JWT +\nconnect_params"]
    B2 --> B3["POST /operator-join\n(caches metadata)"]
    B3 --> B4["Frontend: Voice SDK\ndevice.connect()"]
    B4 --> B5["Browser audio joins\nconference directly"]
end" %}
```

| Method      | Audio Transport                 | STT                              | Latency                   | Use Case                                |
| ----------- | ------------------------------- | -------------------------------- | ------------------------- | --------------------------------------- |
| **Phone**   | PSTN (telephony dials operator) | Dedicated per-participant stream | Higher (PSTN round-trip)  | Remote operators, PSTN connectivity     |
| **Browser** | WebRTC (direct browser audio)   | Browser-handled                  | Lower (direct connection) | Desktop operators, real-time monitoring |

**Idempotent join**: Joining is idempotent - the same operator joining the same call returns the cached response. A *different* operator attempting to join the same call receives a conflict error (only one operator per call).

### Browser Connection Flow

1. Request an access token via `POST /operator-access-token` → returns a JWT and `connect_params`
2. Call `POST /operator-join` → caches operator metadata (no telephony API call for browser)
3. Frontend creates a Voice SDK device with the token, registers, and calls `device.connect({ params: connectParams })`
4. The telephony system calls the voice agent's TwiML endpoint - detects browser operator via the client identity in the token
5. Returns conference join TwiML - browser audio joins the existing conference directly
6. For listen-mode joins, the operator is muted at the TwiML level

## Three-Party Speaker Resolution

When an operator is on the call, three participants produce audio simultaneously. Speaker attribution uses a priority chain:

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F", "clusterBkg": "#F1EAE7", "clusterBorder": "#D7D2D0"}}}%%
flowchart TB
A\["Audio received"] --> B{"Operator STT\ndetects speech?"}
B -->|Yes| C\["Speaker = Operator"]
B -->|No| D{"Caller STT\ndetects speech?"}
D -->|Yes| E\["Speaker = Caller"]
D -->|No| F\["Speaker = Caller\n(default)"]" %}

Every turn in the call record carries `speaker_id` and `speaker_role` - ensuring accurate transcript attribution even in three-party conversations.

## Escalations

When the system detects a situation requiring human intervention, it triggers an escalation. Three sources:

| Source               | Trigger                                                                                                                                                                      | Confidence                    |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- |
| **Auto-escalation**  | [Conversation monitor](https://docs.amigo.ai/developer-guide/platform-api/voice-agent#conversation-monitor) detects safety-critical content via semantic matching + AI judge | System-assessed risk          |
| **Agent-requested**  | Context graph logic determines a human is needed                                                                                                                             | Based on context graph design |
| **Caller-requested** | Caller explicitly asks to speak with a person                                                                                                                                | High (explicit request)       |

### Escalation Lifecycle

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F"}}}%%
stateDiagram-v2
\[*] --> Requested: Trigger (auto/agent/caller)
Requested --> Connected: Operator joins call
Connected --> Handback: Operator returns to listen
Handback --> Completed: Call ends or operator completes
Connected --> Completed: Call ends or operator completes
Requested --> \[*]: Call ends (no operator available)" %}

**Dual-entity event writes**: Each escalation event is written to BOTH the call entity AND the operator entity - enabling independent state recomputation for each. The call entity tracks its escalation history; the operator entity tracks their availability and performance.

**Event chain**: Each escalation event supersedes the previous (`escalation.requested` → `escalation.connected` → `escalation.handback` → `escalation.completed`), forming a version chain with full temporal history.

| Event                  | Written When                               | Key Data                                           |
| ---------------------- | ------------------------------------------ | -------------------------------------------------- |
| `escalation.requested` | Trigger fires                              | Source (auto/agent/caller), risk context           |
| `escalation.connected` | Operator joins and takes over              | Operator entity ID, connection type, response time |
| `escalation.handback`  | Operator returns to listen mode            | Agent resumes                                      |
| `escalation.completed` | Call ends or operator explicitly completes | `handle_time_seconds`, outcome                     |

### Conversation Monitor (Safety Detection)

The [voice agent's conversation monitor](https://docs.amigo.ai/developer-guide/platform-api/voice-agent#conversation-monitor) is the primary source of auto-escalations. It uses a two-stage pipeline:

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F", "clusterBkg": "#F1EAE7", "clusterBorder": "#D7D2D0"}}}%%
flowchart TB
A\["Every caller turn"] --> B\["Embed transcript\n(vector)"]
B --> C\["Matrix multiply:\ncosine similarity vs\nall concept vectors\n(<1ms)"]
C --> D{"Above concept\nthreshold?"}
D -->|"≥ standalone\nthreshold (0.85)"| E\["Immediate escalation\n(no judge needed)"]
D -->|"Above standard\nthreshold"| F\["AI Judge evaluates\nwith structured output"]
D -->|"Below"| G\["No action"]
F --> H{"Judge decision"}
H -->|hard\_escalate| I\["Interrupt agent +\nimmediate escalation"]
H -->|soft\_escalate| J\["Escalate after\ncurrent turn"]
H -->|alert| K\["Log event only"]
H -->|ignore| G" %}

**Default safety concepts** (5, always active):

* Suicidal ideation
* Self harm
* Domestic violence
* Adverse drug reaction
* Post-discharge red flag

**Custom concepts**: Workspaces can add domain-specific safety concepts with pre-computed embeddings - loaded from the world model at session start.

**Standalone fallback**: If semantic similarity exceeds a high threshold (0.85), escalation triggers immediately without the AI judge - providing a safety net even if the judge model is unavailable.

## Operator Flow (Complete)

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"actorBkg": "#083241", "actorTextColor": "#FFFFFF", "actorBorder": "#083241", "signalColor": "#575452", "signalTextColor": "#100F0F", "labelBoxBkgColor": "#F1EAE7", "labelBoxBorderColor": "#D7D2D0", "labelTextColor": "#100F0F", "loopTextColor": "#100F0F", "noteBkgColor": "#F1EAE7", "noteBorderColor": "#D7D2D0", "noteTextColor": "#100F0F", "activationBkgColor": "#E8E2EB", "activationBorderColor": "#083241", "altSectionBkgColor": "#F1EAE7", "altSectionColor": "#100F0F"}}}%%
sequenceDiagram
participant Op as Operator
participant API as Platform API
participant VA as Voice Agent
participant Caller

```
Note over Caller,VA: Active call in progress

Op->>API: POST /operator-join
API->>VA: Add to conference
VA->>VA: Create operator STT stream
Note over Op: Listen mode (muted)

Op->>API: POST /operator-mode (takeover)
API->>VA: Unmute operator, mute agent speaker
Note over VA: Agent processing continues (muted)

Op-->>Caller: Operator speaks directly
Caller-->>Op: Caller responds
Note over VA: Captures human_segment turns

Op->>API: POST /operator-mode (listen)
API->>VA: Mute operator, unmute agent speaker
Note over VA: Agent resumes speaking

Note over Op: Listen mode - monitoring

Op->>API: POST /send-guidance
API->>VA: Inject guidance event
Note over VA: Interrupts agent speech
Note over VA: Agent acts on guidance
Note over Caller: Caller unaware of guidance

Op->>API: POST /operator-leave
API->>VA: Remove from conference
Note over VA: Persist human transcripts
Note over VA: Clean up operator STT
Note over VA: Write escalation.completed" %}
```

## Operator Guidance

Operators in listen mode can send text guidance to the agent without taking over the call:

```
POST /v1/{workspace_id}/operators/{operator_id}/send-guidance
Authorization: Bearer <api_key>
```

```json
{
  "call_sid": "CA1234...",
  "message": "Ask for their insurance ID before confirming the appointment"
}
```

The guidance is injected into the active session as a `guidance` event type - it interrupts the agent's current speech and the agent acts on the instruction immediately. This is distinct from operator takeover: the caller does not know a human intervened.

**Permission**: Requires `Operator:Update` on the operator entity. The operator must belong to the same workspace as the call.

**How it works**: The platform API proxies the guidance to the voice agent's [session event injection](https://docs.amigo.ai/developer-guide/platform-api/voice-agent#session-event-injection) system via `VoiceAgentClient.inject_event()`. The operator's identity is recorded in the call transcript as the event sender.

**Delivery status**: The response includes a `status` field - `"delivered"` confirms the session received the guidance, `"queued_no_subscriber"` indicates the call may have ended or the session is not yet listening.

## Dashboard & Analytics

| Endpoint               | Description                                                                                    |
| ---------------------- | ---------------------------------------------------------------------------------------------- |
| **Dashboard**          | Composite view: operator status counts, active escalations, daily stats, review queue depth    |
| **Active Escalations** | Paginated queue of pending escalations with risk context                                       |
| **Escalation Stats**   | Aggregation by period (`day`, `week`, `month`) and dimension (`status`, `trigger`, `operator`) |
| **Performance**        | Operator rankings by response time, resolution rate, and average handle time                   |
| **Call Transcript**    | Human-segment events for a specific operator's call participation                              |

## Audit Log

Every operator action is recorded as a world model event, providing a complete audit trail:

| Action              | Event Written           | Entities Updated                       |
| ------------------- | ----------------------- | -------------------------------------- |
| Join call           | `operator.joined`       | Call + Operator (status → busy)        |
| Switch to takeover  | `operator.mode_changed` | Call (audit)                           |
| Switch to listen    | `operator.mode_changed` | Call (audit)                           |
| Leave call          | `operator.left`         | Call + Operator (status → online)      |
| Complete escalation | `escalation.completed`  | Call + Operator (handle time recorded) |

**Dual-entity writes**: Every escalation event is written to both entities to enable independent queries - "show me all escalations for this call" and "show me all escalations this operator handled" are both O(1) entity reads, not cross-entity joins.

## Entity Types

### Operator Entity

Projected state from operator events:

| Field                     | Description                                  |
| ------------------------- | -------------------------------------------- |
| `status`                  | Current availability (online, busy, offline) |
| `profile`                 | Name, contact info                           |
| `escalation_count`        | Total escalations handled                    |
| `avg_handle_time_seconds` | Average escalation duration                  |
| `last_active_at`          | Last activity timestamp                      |

### Call Entity (Escalation Fields)

The call entity's projected state includes escalation-specific fields:

| Field                | Description                                                                |
| -------------------- | -------------------------------------------------------------------------- |
| `escalation_status`  | Current escalation state (none, requested, connected, handback, completed) |
| `escalation_history` | Full timeline of escalation events                                         |
| `human_segments`     | Operator speech transcripts captured during takeover                       |
| `audit_summary`      | All operator actions with timestamps                                       |

## API Reference

* [Operators](https://docs.amigo.ai/api-reference/readme/platform/operators)
