# Continuous Improvement

<figure><img src="/files/xbL2yDSUEY4t6uJQFuTR" alt="Continuous improvement pipeline: measure (metric evaluation, baselines), identify (pattern discovery, root cause, candidates), test and promote (simulation, A/B comparison, staged rollout)"><figcaption></figcaption></figure>

Every interaction your agents handle generates data: which context graph paths led to successful outcomes, which tool sequences resolved issues fastest, where patients dropped off, which escalation triggers fired too early or too late. The platform turns this operational data into measurable improvements without manual tuning.

## How It Works

The improvement loop has three stages that run continuously:

**1. Measure** - The platform instruments every decision point in every interaction. For each call or text session, it records which context graph states were visited, which tools were called (and whether they succeeded), how the patient responded emotionally, whether the outcome met quality thresholds, and how long each step took. This produces a structured record of what happened and why.

**2. Identify** - The analytics layer compares outcomes across thousands of interactions to find patterns. It might discover that a specific scheduling flow works well for new patients but causes confusion for returning patients. Or that a particular escalation threshold fires too aggressively for one clinic but not aggressively enough for another. These findings are specific and actionable - not generic recommendations, but precise configuration changes with predicted impact.

**3. Test and promote** - Candidate improvements are tested in [simulation](/testing/testing/simulations.md) before reaching production. The platform runs the proposed change against realistic scenarios, measures whether it actually improves the target metric without degrading others, and only promotes changes that pass. [Agent Forge](/reference/agent-forge.md) manages the promotion pipeline with versioning and rollback.

## What Gets Optimized

The platform optimizes at the system level - how components are configured and composed - rather than at the model level.

| Area                        | What the Platform Learns                                                      | Example                                                                                       |
| --------------------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| **Context graph paths**     | Which state transitions produce the best outcomes for different patient types | Returning patients do better when the greeting state skips identity verification              |
| **Tool selection**          | Which tools to call in which order, and when to skip optional steps           | Insurance lookup before scheduling reduces rebooking rates by 40%                             |
| **Escalation thresholds**   | When to involve a human operator vs. continue autonomously                    | Emergency department callers benefit from lower escalation thresholds than routine scheduling |
| **Dynamic behavior tuning** | Which runtime behaviors to activate for different conversation contexts       | Empathy behaviors should activate earlier for callers whose emotion baseline trends negative  |
| **Memory retrieval**        | How aggressively to pull historical context into the conversation             | Medication review calls need deeper history; appointment confirmations need less              |
| **Confidence thresholds**   | How much verification data needs before trusting it for EHR writeback         | Insurance data from voice capture needs stricter review than data from photo uploads          |

## Multi-Objective Optimization

Enterprise success is never a single metric. A call that scores high on clinical accuracy but takes 45 minutes has failed. A call that books an appointment quickly but misses an insurance issue has failed differently. The platform optimizes across all objectives simultaneously.

For each workspace, success is defined by an acceptance region - a set of thresholds that must all be satisfied:

* Clinical accuracy above threshold
* Patient satisfaction above threshold
* Safety violations at zero
* Call duration within range
* Cost per interaction within budget

An interaction that succeeds on accuracy but fails on empathy is outside the acceptance region. The platform finds configurations that reliably land inside the region across all dimensions, not just on average but in worst-case scenarios.

Without multi-objective optimization, improving one metric tends to degrade others. The system discovers the real trade-offs (deeper reasoning improves accuracy but increases duration) and finds the configurations that balance them for your specific priorities.

## The Recursive Improvement Loop

The measurement layer also improves itself. The functions that score data quality, evaluate source reliability, and measure composition outcomes are themselves subject to the same improvement cycle.

Here is how this works in practice:

1. The platform measures which tool compositions produce the best outcomes
2. Those measurements reveal that certain data sources are more reliable than others
3. The source reliability scores are updated, which changes how confidence gates evaluate incoming data
4. Better confidence scoring means the world model is more accurate
5. More accurate world model data means agents make better decisions
6. Better decisions produce better outcomes, which generate better measurements
7. Better measurements improve the next round of source reliability scoring

Each layer - perception, reasoning, action, memory, measurement - is both a consumer and producer of the platform's analytical capabilities. Each new analytical capability enables compositions that reveal the need for capabilities that did not exist before.

## Governance Prevents Runaway Optimization

Every step in the recursive loop is governed:

* **Permissioned** - Only authorized roles can promote configuration changes to production
* **Audited** - Every change, test result, and promotion decision is recorded in the [audit trail](/safety-and-compliance/compliance.md)
* **Versioned** - Every configuration has a version history with rollback capability
* **Bounded** - Safety constraints are hard limits, not optimization targets. The system cannot trade safety for performance.

The governance layer is not bolted on after the fact. The improvement system runs on top of it. A configuration change cannot reach production without passing through the same safety verification pipeline that governs every other platform operation.

## Compounding Returns

The practical effect is that your deployment gets better with use. A workspace that has been running for six months has had thousands of interactions measured, hundreds of patterns identified, and dozens of improvements tested and promoted. The system understands your patient population, your scheduling constraints, your EHR's behavior, and your operators' preferences in ways that a fresh deployment cannot.

This compounds in three ways:

* **Within a workflow**: Improving the scheduling flow produces better appointment data, which improves the pre-visit outreach flow, which produces better intake data, which improves the next scheduling interaction.
* **Across workflows**: Patterns discovered in one service (e.g., how to handle frustrated callers) transfer to other services in the same workspace.
* **Across the measurement system**: Better measurements produce better improvements, which produce better measurements. The system gets better at getting better.

Organizations that start earlier build a compounding advantage. The improvements from month one make month two's improvements faster and more precise, and so on.

{% hint style="info" %}
For the testing and simulation infrastructure that powers the improvement loop, see [Testing Overview](/testing/testing.md). For the CLI tools that manage configuration promotion, see [Agent Forge](/reference/agent-forge.md).
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/agent/pattern-discovery-and-reuse.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
