Design Principles
Four engineering principles that guide every architecture decision - measurement-first, composable, risk-calibrated, and goal-oriented.
Four principles guide every architecture decision in the Amigo platform. They are not aspirational statements. They are constraints that the engineering team enforces in code.
1. Measurement-First Decisions
Every agent behavior is measurable before it ships. We do not deploy a new workflow and hope it works. We instrument it, run it through simulations, and compare results against baselines with statistical rigor.
Measurement is not optional in healthcare - it is essential because healthcare data is unreliable by nature. Clinical notes are free-text dumps shaped by decades of legacy documentation habits. EHR inputs vary wildly in completeness and accuracy across systems. Scribe outputs reflect what was said, not necessarily what is true. The one exception is billing and RCM data, which is structured and verified because money depends on it. Everything else requires skepticism. You cannot make sound decisions without knowing how much to trust what you know, and you cannot know that without measuring it.
This means:
Every context graph action has associated metrics that define success
Changes are tested against thousands of synthetic conversations before reaching real patients
Promotion from dev to staging to production requires passing quantitative thresholds
Post-deployment, every call generates structured data that feeds back into measurement
Data quality itself is measured - confidence scores track how much each input should be trusted
The practical effect: when someone asks "did this change improve outcomes?", the answer is a number, not an opinion. When a behavior degrades, the system detects it from metric drift before anyone files a complaint. And when the underlying data is bad - which it often is - the system knows that too, and adjusts accordingly.
2. Composable Capabilities
Agents are assembled from independent, reusable components that can be swapped, audited, and tested in isolation.
The building blocks:
Context graphs define what the agent should accomplish (goals, transitions, constraints)
Dynamic behaviors add conditional logic that activates based on context (time of day, patient history, caller emotion)
Memory gives agents access to past interactions and learned preferences
Actions/Skills let agents perform work in external systems
These components compose freely. A scheduling context graph can combine with an insurance verification behavior, a patient memory layer, and an EHR booking skill to create a complete workflow. Swap the context graph, and the same components serve a different use case.
This matters for healthcare because patient safety reviews demand the ability to inspect and audit each piece independently. When a regulator asks "why did the agent do that?", you can trace the decision to a specific context graph state, a specific behavior rule, and a specific memory input.
3. Risk-Calibrated Autonomy
Not all agent decisions carry the same risk. The platform adjusts how much autonomy the agent has based on the confidence level of the data and the stakes of the action.
This principle exists because the underlying data in healthcare is unreliable. Patients provide incomplete information - sometimes by accident, sometimes deliberately. External systems return stale or inconsistent results. EHR records may be months out of date. The agent cannot assume that what it has been told is accurate, and the platform is built around that reality. When confidence is low - because the source is a patient's verbal claim, or because two systems disagree - the agent gathers more information or escalates to a human. When confidence is high - because a fact has been verified against an authoritative system - the agent acts.
Confidence hierarchy - Higher-confidence sources always win regardless of recency. A verified EHR record (1.0) will not be overwritten by something a caller mentioned on a phone call (0.5).
The confidence hierarchy:
1.0 Authoritative: Direct system integration, verified API writes. The data comes from a system of record through a structured API.
0.7 Browser scrape: Portal data captured through UI automation. Likely accurate, but not guaranteed - UI layouts change, fields may be misread.
0.5 Voice: Data extracted from a phone conversation. Subject to mishearing, patient confusion, and deliberate inaccuracy.
0.3 Agent inference: Data the agent derived from context. Useful as a hypothesis, not as a basis for action.
0.0 Rejected: Contradicted by a higher-confidence source, or otherwise untrustworthy. Discarded.
The tool execution tiers follow the same logic:
Low-risk lookups execute instantly with no human involvement
Appointment bookings execute with automated verification
Clinical orders require explicit human approval before execution
The system does not treat every action the same. A patient lookup and a prescription change are fundamentally different operations. The platform encodes that difference structurally, not just in prompts. And external system throughput is uneven - a booking API that responds in 200ms on Tuesday may time out on Monday morning. The platform accounts for this, retrying and adapting rather than failing on the first attempt.
4. Information Sufficiency, Not Script Completion
Agents pursue goals, not fixed scripts. A context graph defines what needs to happen (verify identity, find available slots, confirm booking) and the constraints around it (what data is required, when to escalate). The agent figures out how to get there based on what the caller actually says. Real conversations are unpredictable - a patient might answer the identity verification question and immediately ask about their billing. A goal-oriented system handles the transition because the goals remain the same even when the conversation path changes.
The context graph is a hierarchical state machine, not a decision tree. States represent objectives, not dialogue turns. Transitions happen when objectives are met, not when specific words are spoken.
More specifically, agents track information sufficiency - what they know, what they still need, and how confident they are in each piece. Healthcare is fundamentally an information-gathering process. Most of the work in a front desk call, a triage interaction, or an intake workflow is collecting enough data to act. Once information sufficiency is reached, the actual decision (schedule the appointment, route to the nurse, flag for prior auth) is usually straightforward and fast.
A scheduling agent does not follow a fixed list of questions. If the patient volunteers their date of birth and reason for visit in the first sentence, the agent does not ask those questions again. If the patient's stated insurance does not match what the EHR shows, the agent resolves the discrepancy before proceeding. Inbound information from patients is unreliable - people misremember their provider's name, confuse their insurance plan with their employer, give a pharmacy that closed two years ago. An information-sufficiency model cross-references, verifies, and keeps gathering until the data is trustworthy enough to act on.
The world model is what makes this practical. The agent tracks information sufficiency using the world model's confidence scores directly. When a patient provides their date of birth, the agent checks it against the entity projection. If confidence is already high (verified against the EHR at 1.0), the agent confirms and moves on. If confidence is low (a prior agent inference at 0.3), the agent asks a clarifying question or attempts verification through an external system.
The practical mechanics:
Each context graph state defines what information is required and at what confidence level
The agent tracks a running information state - what is known, what is missing, what conflicts
Transitions fire when sufficiency conditions are met, not when a question count is reached
If a piece of data cannot be verified (the EHR is down, the patient cannot confirm), the agent records the gap and escalates rather than guessing
The result: workflows that are as short as the data allows and as thorough as the situation demands. A simple reschedule with a verified patient takes thirty seconds. A new patient with conflicting insurance information takes longer - not because the script is longer, but because sufficiency has not been reached.
Last updated
Was this helpful?

