# Runtime Safety

Safety in Amigo is a hard constraint, not a feature you enable. The architecture makes unsafe behavior structurally difficult to produce. Every layer - from how data enters the world model to how the agent forms a response to how changes reach the EHR - includes controls that prevent harm before it happens.

The safety architecture has two phases: runtime protection during live conversations (monitoring, triage, risk scoring, escalation) and pre-production validation that catches problems before they reach real calls (simulation testing, staged rollout, version set promotion).

At the data layer, write scope isolation constrains what the agent can modify during an interaction. The agent's tool writes are limited to specific entity types and confidence levels, preventing conversation-extracted data from overwriting authoritative records. System services like the connector runner operate outside this scope because they handle verified, confidence-gated data. These write constraints apply equally across voice and text modalities.

## Conversation Monitoring

During every conversation, the platform evaluates the agent's behavior, the caller's state, and the agent's confidence on every turn. When something falls outside expected parameters, the system adjusts the agent's autonomy or escalates to a human operator. These checks run on both voice and text modalities.

### Content Monitoring

The monitoring system evaluates what is being said against semantic rules called monitor concepts. These rules detect when the conversation touches topics that require special handling: clinical advice, medication dosing, mental health crisis indicators, insurance disputes, or any domain-specific concern configured for the workspace.

### Emotional Tracking

The emotion detection system tracks the caller's emotional state throughout the call. Sustained distress, escalating frustration, or signs of crisis feed into the safety picture. A caller who is becoming increasingly agitated triggers a different response than one who is calmly asking routine questions.

### Confidence Spectrum

The agent maintains a measure of how well it understands the current situation. Confidence is not a binary on/off switch - it is a spectrum, and the agent's autonomy adjusts proportionally:

| Confidence Level | Agent Behavior                                                                                                                              |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| **High**         | Full autonomy. Agent navigates the context graph, executes actions, and responds without restriction.                                       |
| **Moderate**     | Agent continues but hedges responses, seeks confirmation, and avoids committing to actions it is unsure about.                              |
| **Low**          | Agent acknowledges uncertainty and escalates to an operator if available. If no operator is available, it offers to have someone call back. |

Confidence also gates what the agent is willing to do. Scheduling actions require moderate confidence or higher. Clinical data extraction writes events at voice confidence (0.5) regardless of agent confidence - the confidence gates in the connector runner handle verification before anything reaches the EHR. Information delivery adjusts tone: high confidence produces direct statements, lower confidence produces qualified ones.

## Monitor Concepts

A monitor concept is a semantic rule that uses embedding similarity rather than keyword scanning to detect when a conversation approaches topics that require attention. This catches paraphrases, indirect references, and contextual mentions that keyword matching would miss.

{% @mermaid/diagram content="flowchart LR
U\[Utterance] --> L1\[Embedding Similarity]
L1 -->|Below threshold| PASS\[No action]
L1 -->|Above threshold| L2\[Contextual Review]
L2 -->|False positive| PASS
L2 -->|Confirmed| ACT{Action}
ACT --> LOG\[Log]
ACT --> ALERT\[Alert]
ACT --> ESC\[Escalate to Operator]" %}

Each concept has:

* **A description** of what it detects (e.g., "caller is asking for medical advice about dosing")
* **A detection threshold** that controls sensitivity (how close the conversation content must be to the concept's embedding to trigger a match)
* **An action** that specifies what happens when the concept fires (alert, escalate, log, or a custom response)

Monitor concepts are configured per workspace. A scheduling-focused deployment might have concepts for "caller requesting clinical advice" (escalate) and "caller expressing frustration with wait times" (alert). A clinical follow-up deployment might add concepts for "patient reporting worsening symptoms" (escalate) and "patient mentioning medication side effects" (flag for review).

### Dual-Layer Detection

The system uses two detection layers in sequence:

1. **Embedding similarity** - Every utterance is compared against configured monitor concepts using vector similarity. Fast, catches most cases, but can produce false positives on semantically similar but contextually different statements.
2. **Contextual review** - When similarity exceeds the threshold, the flagged content is evaluated in the full conversation context. Only confirmed flags trigger the configured action.

This keeps detection sensitive (low false negatives from the embedding layer) while keeping the action queue clean (low false positives from the contextual review).

### Threshold Calibration

Detection thresholds control the trade-off between catching real signals and generating false positives. Start at 0.80-0.85 and tune from there:

* **Too low** (0.70) - Fires frequently on tangential conversations. Operators learn to ignore alerts.
* **Too high** (0.95) - Only catches near-exact matches. Indirect references and paraphrases slip through.
* **Calibration cycle** - Deploy, observe review queue volume for a week, review false positives (if operators dismiss more than 30%, raise threshold 0.02-0.03 at a time), spot-check unflagged conversations, and re-evaluate after workflow changes.

### Regulation Templates

Pre-built sets of monitor concepts for specific compliance requirements. Apply a template to get a baseline, then customize. Healthcare templates cover HIPAA-related patterns, clinical boundary monitoring, crisis detection, and medication safety. Templates are composable - apply multiple to a single workspace. Each creates individual concepts you can modify independently.

### The Review Queue

When a monitor concept fires, the event is logged and may be added to the review queue depending on the concept's configured action. Each item includes the conversation transcript with the triggering segment highlighted, which concept fired and its similarity score, the caller's emotional state at the time, and the agent's response.

The review queue serves two purposes: immediate response (was the escalation handled correctly?) and long-term tuning (should this concept's threshold be adjusted?). The review queue correction rate is a useful proxy for calibration quality - a high correction rate suggests the automated pipeline is missing nuances that human reviewers catch.

### How Monitors Trigger Operator Escalation

When a monitor concept with an escalation action fires during a live call:

1. The platform checks whether an operator is available
2. If available, the operator receives a notification with call context and the reason for escalation
3. The operator can join the call in listen mode to assess the situation
4. If the operator determines intervention is needed, they switch to takeover mode

If no operator is available, the event is logged, the agent adjusts its behavior (increased caution, more frequent confirmation-seeking), and the conversation is flagged for post-call review. The monitoring system does not replace operator judgment. It surfaces situations that may need attention. The operator decides what to do about them.

## Regulatory Triage

The triage system evaluates every turn against regulation-specific frameworks. Unlike monitor concepts (which use embedding similarity for broad topic detection), triage runs structured assessment with graduated concern levels.

### Concern Levels

| Level | Meaning                                                                 |
| ----- | ----------------------------------------------------------------------- |
| 0     | No concern detected                                                     |
| 1     | Low concern - log for review                                            |
| 2     | Moderate concern - increase agent caution, notify operator if available |
| 3     | High concern - immediate escalation                                     |

### Built-In Templates

**Joint Commission NPSG 15 (Suicide Risk)** - Detects indicators including farewell language, references to giving away possessions, sudden calm after distress, and oblique references to "not being around."

**VAWA (Domestic Violence)** - Detects references to unexplained injuries, controlling behavior by partners, isolation from support systems, and mentions of threats at home.

**FDA MedWatch (Adverse Drug Reactions)** - Detects new symptoms after medication changes, descriptions of swelling/rash/breathing difficulty, unexpected bleeding, and symptoms correlating temporally with medication changes.

Each template includes triage hints - specific linguistic and behavioral patterns the LLM watches for beyond direct statements. Custom templates can be created for organization-specific requirements.

### Accumulation

Concern signals accumulate across turns within a single conversation. This catches patterns that emerge gradually rather than in a single statement. A patient who mentions feeling "tired of dealing with everything" (level 1), then says "I just want it to be over" (level 1), then falls silent for an extended period - no single turn would trigger escalation, but the accumulated pattern does.

Two parameters control accumulation: the mild threshold (minimum concern level that counts) and the fast-track level (concern level that bypasses accumulation and triggers action immediately, typically level 3). When accumulated concern crosses the configured threshold, the system treats it as equivalent to a single detection at the fast-track level.

### Triage Configuration

Triage is configured at the workspace level through safety policy settings:

| Setting              | What It Controls                                             | Default                           |
| -------------------- | ------------------------------------------------------------ | --------------------------------- |
| **Templates**        | Which regulatory frameworks are active                       | None (must be explicitly enabled) |
| **History window**   | How many recent turns are included in the evaluation context | 10 turns                          |
| **Accumulation**     | Whether concern signals accumulate across turns              | Enabled                           |
| **Mild threshold**   | Minimum concern level that counts toward accumulation        | Level 1                           |
| **Fast-track level** | Concern level that triggers immediate action                 | Level 3                           |

Templates are composable. You can apply multiple templates to a single workspace. Each template creates an independent triage evaluation, so a call can trigger both a suicide risk concern and an adverse drug reaction concern simultaneously.

Triage runs in parallel with the agent's normal processing. It does not add latency to the response. If the triage evaluation takes longer than the configured timeout, the agent responds without waiting for the result.

### How Triage Connects to Escalation

Triage does not act alone. It feeds into the broader safety response:

1. **Triage evaluates** each turn and returns a concern level (0-3)
2. **Risk scoring** incorporates triage results alongside emotion signals, loop detection, and call duration into a composite risk score
3. **Escalation** triggers when risk thresholds are exceeded
4. **Review queue** captures the triage event with full context for compliance audit and threshold tuning

A single triage detection at level 3 can trigger immediate escalation. Lower-level detections (1-2) contribute to the composite risk score, where they may combine with other signals to cross the escalation threshold.

## Risk Scoring

The agent computes a composite risk score on every turn from three weighted signals:

* **Caller emotion** (40%) - From the emotion detection system
* **Loop detection** (30%) - Whether the conversation is stuck repeating
* **Interaction duration** (30%) - Relative to expected length for this service type

The score maps to four levels:

| Level        | Meaning                                                                    |
| ------------ | -------------------------------------------------------------------------- |
| **Normal**   | Conversation proceeding within expected parameters. No intervention.       |
| **Monitor**  | Elevated signals. Agent continues, operator dashboard highlights the call. |
| **Alert**    | Operator receives active notification. Agent increases caution.            |
| **Escalate** | Operator is prompted to join the call. Agent narrows its behavior.         |

Thresholds for each level are configurable per workspace. Individual context graph states can override the default thresholds when certain conversation phases carry higher clinical risk (e.g., a symptom triage state may have a lower escalation threshold than a scheduling confirmation state). Risk scoring applies across both voice and text modalities.

### How Escalation Works

When the platform determines a call needs human involvement:

1. **Notification** - The operator on duty receives context: caller identity, conversation summary, reason for escalation, and relevant signals.
2. **Join** - The operator joins the live conference call in listen mode.
3. **Decision** - The operator decides whether to take over or let the agent continue with monitoring.
4. **Takeover** (if needed) - The operator switches to takeover mode. The agent's audio is suppressed and the operator speaks directly with the caller.
5. **Return** - The operator leaves or switches back to listen mode. The agent resumes with full context.

The caller experiences this as a single continuous call. No transfer, no hold, no disruption.

Multiple signals can trigger escalation: a monitor concept fires, the caller explicitly asks for a person, confidence drops below threshold, or sustained caller distress is detected. These signals are not evaluated in isolation - a single borderline signal may not trigger escalation, but two moderate signals together may.

If no operator is available when an escalation triggers, the event is logged, the agent adjusts its behavior (increased caution, more frequent confirmation-seeking), and the conversation is flagged for post-call review.

## Deployment Safety

Pre-production validation catches problems before they reach live calls. No configuration change - whether a new context graph, updated safety rules, modified voice settings, or a new action - should go to production without structured testing and staged rollout.

The runtime and deployment phases form a continuous loop. A problem detected by runtime monitoring (e.g., the agent consistently struggling with a specific type of insurance question) feeds back into deployment safety (new simulation scenarios are created to cover that case) and compliance (the audit trail shows when the problem started and how it was addressed).

### Simulation Testing

New agent configurations are tested against synthetic scenarios. Each simulation runs a complete conversation through the agent's pipeline, including context graph navigation, tool execution, world model queries, and response generation.

**Personas** define simulated callers with specific characteristics: demographics, what the caller is trying to accomplish, how they respond to the agent (cooperative, confused, frustrated, in a hurry), and edge cases (unusual requests, ambiguous phrasing, mid-conversation topic changes). Personas are designed to cover the range of real interactions your deployment handles. A scheduling deployment might have personas for straightforward rescheduling, insurance questions mid-call, callers who cannot remember their date of birth, and callers who request medical advice the agent should decline.

**Scenarios** define conversation flow and expected outcomes: setup data in the world model before the call starts, the sequence of caller utterances, expected agent behaviors at each stage (navigate to the correct state, call the right tool, escalate when appropriate), and measurable success criteria (correct appointment booked, escalation triggered at the right moment, no safety monitor violations). A typical pre-deployment validation runs hundreds of scenarios across dozens of personas.

### Version Set Promotion

Version sets control how agent configurations move from development to production. Each promotion step requires passing quality gates:

| Gate                    | What It Checks                                             | Failure Behavior                |
| ----------------------- | ---------------------------------------------------------- | ------------------------------- |
| **Safety tests**        | All safety-related simulations pass (100% required)        | Blocks promotion                |
| **Regression tests**    | Full suite passes with no new failures vs. current release | Blocks promotion                |
| **Safety monitors**     | No new monitor violations vs. current baseline             | Blocks promotion                |
| **Performance metrics** | Latency, escalation rate, task completion meet thresholds  | Warning or block (configurable) |
| **Quality score**       | Composite quality across simulations meets minimum         | Warning or block (configurable) |

Safety gates are hard blocks with no override. Performance and quality gates can be configured as warnings or blocks depending on your organization's risk tolerance.

### Rollback

If a problem is found in production after promotion, the previous release version set is available for immediate rollback. Rollback is a configuration change, not a code deployment - the agent uses the previous version set on the next call. In-progress calls complete with the version set they started with. Because version sets include the full agent configuration (context graph, dynamic behaviors, safety rules, voice settings), rollback restores the exact prior state.

### Behavioral Validation

Beyond pass/fail simulation results, deployment safety includes qualitative review of agent behavior:

* **Conversation quality** - Do responses sound natural and appropriate? Are they too verbose, too terse, or off-tone?
* **Escalation judgment** - Does the agent escalate at the right moments? Too often (disrupting operations) or too rarely (missing situations that need review)?
* **Edge case handling** - When the conversation goes off-script, does the agent recover gracefully or get stuck?
* **Safety boundary compliance** - Does the agent stay within its configured scope? When asked to do something outside its capabilities, does it decline appropriately?

The simulation framework produces transcripts and recordings that reviewers examine before approving promotion to release. These assessments are part of the staging review process and happen before every promotion to release.

## Per-Service Configuration

Safety monitoring is enabled by default on all services but can be toggled per-service. When safety filters are disabled, the monitoring system (concepts, triage, accumulation) is bypassed while independent risk scoring remains active. This allows non-clinical services to operate without clinical safety rule overhead while maintaining baseline risk detection.

## What Safety Is Not

Safety in this context does not mean:

* **Content filtering on outputs.** The agent's behavior is governed by its context graph, safety monitors, and escalation rules - not by a post-hoc filter that scans generated text.
* **A separate module added to the side.** Safety controls are embedded in the architecture: in how the world model resolves conflicts, in how the connector runner gates writes, in how the agent engine decides when to escalate.
* **A guarantee of perfect outcomes.** The system is designed to minimize harm and maximize the probability of correct behavior. When it cannot be confident in the right course of action, it escalates to a human.

{% hint style="warning" %}
Triage and monitoring are detection aids, not clinical assessment tools. They surface patterns that may warrant human attention. The operator or clinician makes the clinical judgment. Organizations should establish clear protocols for how escalated events are handled by staff.
{% endhint %}

{% hint style="info" %}
**Developer Guide** - For API endpoints, SDK examples, and integration details, see [Safety & Monitoring](https://docs.amigo.ai/developer-guide/platform-api/platform-api/safety-and-monitoring) in the developer guide.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/safety-and-compliance/runtime-safety.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
