> For the complete documentation index, see [llms.txt](https://docs.amigo.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.amigo.ai/data/world-model.md).

# World Model

Every capability on the platform - the agent engine, connector runner, operator workflows, outbound campaigns, analytics - depends on one thing: a shared, trustworthy picture of reality. The world model is that picture. It is not a database the agent queries. It is closer to memory. The agent knows its patient the way you know your own name - automatically, without effort, without a lookup step.

This removes a boundary that most healthcare systems treat as fundamental: the separation between the "data layer" and the "intelligence layer." In traditional architectures, data sits in a store and an application queries it. In the world model, context flows into the agent automatically, and the agent's reasoning flows back as structured events. Data and intelligence are the same loop.

The practical consequence is that dirty data becomes useful. The world model does not clean data at ingestion. It accepts raw events - EHR feeds, voice transcripts, third-party data sources, manual imports - each tagged with provenance and confidence. Clean entity state is not a precondition; it is a computed output. A deterministic projection function reads all events for an entity and produces the current state. The high-quality core is what allows the messy periphery to work. You do not need perfect inputs. You need a system that can compute the right answer from imperfect ones.

{% hint style="info" %}
This is an event-sourced architecture. If you have worked with event sourcing in other systems, the principles are the same. If you have not, the core idea is straightforward: instead of updating records in place, you append facts to a log. The current state of any entity is derived by replaying the relevant facts.
{% endhint %}

## Data Pipeline

<figure><img src="/files/6M4hFpfEaupN74V0i82m" alt="Data pipeline: external systems through connectors and unification to world model event store, entity resolution, and projections, with verification gates and output to metric store, analytics, and outbound sync"><figcaption></figcaption></figure>

Data enters from external systems through connectors, passes through the unification engine into the event store, and gets resolved into entity projections. Automated review gates verify extracted data before it reaches full confidence. Verified projections feed the metric store and analytics, while outbound sync writes confirmed data back to external systems. Agent conversations also generate events that flow back into the world model, closing the loop.

## Why Event Sourcing for Healthcare

Healthcare data is fundamentally unreliable. The world model's architecture is designed to produce reliable state from unreliable inputs.

**Most clinical data is low quality.** Outside of billing, revenue cycle management, and some operational data, the information in healthcare systems is far from clean. Clinical notes are unstructured free text. Documentation outputs vary in accuracy depending on the model, source quality, and complexity of the encounter. EHR inputs are frequently copy-pasted templates carried forward from visit to visit with minor edits, making it hard to distinguish current facts from stale ones. Billing and RCM data is structured because money depends on it. Clinical data does not have the same forcing function, and the quality reflects that.

**Inbound data from patients is not trustworthy by default.** Callers give wrong dates of birth, confuse medication names, misremember their doctor's name, or provide incomplete details. Some calls are pranks. Some are from people who are confused, stressed, or in pain. You cannot treat patient-provided information as verified fact. It is input that needs to be scored, compared against existing records, and promoted or discarded based on corroboration.

**External systems have uneven reliability and throughput.** EHRs, FHIR stores, practice management systems, and insurance verification services all behave differently. Response times vary. A write that succeeds on Monday might time out on Tuesday. Some systems return stale cached data. Others silently drop updates. Any architecture that assumes external systems are consistent and available will fail in production.

**Traditional record-update approaches break down here.** If you update records in place, you lose the trail of what the system believed and when. When a downstream write fails, you have no clean way to know what state you were trying to reach. When two sources disagree, the last write wins by accident, not by policy. Event sourcing with confidence scoring is the architectural answer: every fact is tagged with where it came from, how much to trust it, and what it supersedes. Nothing is overwritten. The full history is always available for replay, audit, or correction.

## Four Invariants

The world model enforces four rules that never change regardless of how the system evolves.

### 1. Events Are the Only Source of Truth

There is no way to modify an entity's state directly. The only way to change what the system believes about a patient, appointment, or any other entity is to insert a new event. The entity's state is then recomputed from all of its events.

This eliminates a class of problems common in healthcare IT: two systems updating the same record concurrently, with the last write silently overwriting the first. In the world model, both writes are preserved as separate events. The projection function determines the current state using confidence ranking, not write order.

### 2. Events Are Append-Only and Immutable

Once an event is written, it is never modified or deleted. If new information contradicts an earlier event, a new event is created that supersedes the old one. The old event remains in the log permanently.

This is what makes dirty data tractable. You do not have to get it right the first time. Record what you learned. Learn more later. Newer events supersede older ones. Both are preserved - the original for audit, the correction for current state. In a domain where information arrives incomplete, out of order, and frequently wrong, immutability means every mistake is recoverable, every correction is traceable, and no update can silently destroy what came before.

The one exception to immutability is data source deletion. When a data source is removed from a workspace, all events originating from that source are permanently deleted. This is a deliberate design choice for data lifecycle management - when an operator decides that a data source's contribution should be fully retracted, the system removes both the source configuration and every event it produced, then recomputes affected entity projections from the remaining events. This supports right-to-erasure requirements and clean workspace teardown.

This matters for healthcare operations because it provides:

* **Audit trails** - You can always answer "why did the system believe X at time Y?"
* **Temporal queries** - You can reconstruct the state of any entity at any point in the past
* **Undo capability** - Reversing a decision means inserting a new event, not deleting the old one

### 3. Entity State Is a Pure Function of Events

Given the same set of events, the system always produces the same entity state. The projection function is deterministic. Multiple processes can trigger recomputation concurrently and they will always arrive at the same result, because the function reads all current events and writes the output atomically.

This determinism is also what makes outbound reliability possible. When writing back to an EHR, the system does not send raw event data. It reconstructs the complete entity state from all events, then translates that projection into the EHR's format. Noisy incoming data does not propagate backward. A patient's phone number might arrive through a voice call at 0.5 confidence, get corroborated by an EHR lookup at 1.0 confidence, and get projected into a clean, authoritative record. That projected record - not the noisy inputs - is what flows to the downstream system.

#### Multi-Level Projections

Entity state is the first projection, but not the last. Patient memory, scheduling intelligence, and clinical findings are all computed from entity state and events - each level derived from the level below. The chain is deterministic throughout: the same events produce the same entity state, the same entity state produces the same memory and scheduling views, and those views produce the same clinical findings.

```mermaid
flowchart TD
    events["Raw Events\n(confidence-scored, append-only)"] --> entity["Entity State\n(patient, provider, appointment)"]
    events --> memory["Patient Memory\n(6 behavioral dimensions)"]
    entity --> memory
    events --> scheduling["Scheduling Intelligence\n(patterns, availability)"]
    entity --> scheduling
    memory --> clinical["Clinical Intelligence\n(detection pipeline findings)"]
    scheduling --> clinical
    entity --> clinical
```

| Level                     | What It Projects                                                       | Derived From                       |
| ------------------------- | ---------------------------------------------------------------------- | ---------------------------------- |
| **Entity state**          | Current state of each patient, provider, appointment, and other entity | Raw events                         |
| **Memory and scheduling** | Patient memory dimensions, appointment patterns, scheduling state      | Entity state + events              |
| **Clinical intelligence** | Drug interactions, care gaps, coding readiness, encounter findings     | Memory + scheduling + entity state |

A single event - a new medication added through an EHR sync - automatically updates entity state, which updates the patient's memory, which updates clinical findings. Intelligence is recomputed continuously as new information arrives.

### 4. Confidence Resolves Conflicts

When two sources provide conflicting information about the same fact (for example, a voice transcript says a patient's pharmacy is on Main Street, but the EHR record says Oak Avenue), the system does not use timestamps to pick a winner. Instead, it uses a confidence ranking based on the source. This reflects the reality that most information entering a healthcare system has unknown reliability until it has been verified against another source.

| Confidence | Source         | Example                                                              |
| ---------- | -------------- | -------------------------------------------------------------------- |
| **1.0**    | Authoritative  | Manual entry, explicit relationships, authoritative API writes       |
| **0.95**   | Human-approved | Data reviewed and approved by a human operator                       |
| **0.9**    | High           | Operator-verified data, high-quality adapter output                  |
| **0.8**    | EHR-trusted    | Trusted clinical data from EHR systems                               |
| **0.7**    | Verified       | LLM-verified data, browser-scraped portal data, EHR-ingested records |
| **0.5**    | Self-report    | Patient-submitted form data, patient-confirmed information           |
| **0.3**    | Agent raw      | Raw voice agent inference, unverified extraction from conversation   |
| **0.0**    | Rejected       | Contradicted by a higher-confidence source, or human-rejected        |

Within the same confidence class, the most recent event wins. Across confidence classes, higher confidence always wins regardless of recency. A verified EHR record will not be overwritten by something a caller mentioned on a phone call, but two consecutive EHR updates will resolve to the most recent one.

Agent writes are not a single confidence tier. When the agent captures data during a conversation, it assesses how the information was obtained. A phone number the patient explicitly confirmed ("Yes, my number is 555-1234") enters at higher confidence than a medication name mentioned in passing, which enters higher than something the agent inferred from context. The existing review pipeline promotes low-confidence data upward through automated and human review.

Confidence carries through all projection levels. Each field in an entity's projected state tracks which event produced it and at what confidence level. When higher-level projections (memory, scheduling, clinical findings) read from entity state, confidence propagates - a clinical finding is only as reliable as the weakest data point it depends on. When the agent or a provider views intelligence derived from the world model, every piece of it carries a quality signal reflecting the reliability of the underlying data.

## Three Data Channels

{% hint style="info" %}
The boundary between the agent and its data is not a query interface. It is three distinct channels, each reflecting a different relationship between the agent and what it knows.
{% endhint %}

### Ambient

Data that is pushed into the agent's context automatically at the start of each conversation turn. The agent does not request this data; it is always present. Examples: patient name, upcoming appointments, recent encounter history.

This is the channel that removes the boundary between infrastructure and intelligence. The agent does not look up the patient's next appointment - it already knows, the same way a receptionist who has worked at a clinic for ten years knows. When a patient calls and the agent says "I see you have an appointment this Thursday," no query ran. That information was already part of the agent's state.

### Queried

Data that the agent retrieves on demand through tool calls during a conversation. The agent decides it needs specific information and requests it. Examples: searching for available appointment slots, looking up insurance details, checking medication lists.

Queried data covers information that is too large or too specific to include in every conversation turn. The agent pulls it when the conversation requires it. This is the traditional "application queries database" pattern, but scoped narrowly: most of what the agent needs arrives through the ambient channel. Queries handle the long tail.

### Extracted

Data that the agent captures from the conversation and writes back to the world model as a natural consequence of thinking. When a patient provides new information during a call - a new phone number, an insurance change, a medication update - the agent writes that as an event with confidence based on how the information was obtained (confirmed, mentioned, or inferred).

During a live voice call, the system periodically extracts structured patient data from the conversation. Every few turns, the recent transcript is reviewed and key fields are captured - phone numbers, dates of birth, email address (free-text or structured with street, city, state, and postal code), gender, preferred languages, insurance carrier and member IDs, addresses - writing them to the world model as events with appropriate confidence levels. The world model accumulates structured data in real time during the call, not just after it ends.

The agent's reasoning generates structured facts as a byproduct of doing its job. There is no separate "data capture" step. Understanding the patient and capturing information are the same act.

Extracted data does not go directly to the EHR. It enters the world model, flows through the [connector system's](/data/connectors-and-ehr.md) confidence gates and review pipeline, and only reaches external systems after verification. A workspace can sync verified data to multiple destinations simultaneously - for example, both an EHR and a CRM - with each destination receiving only the entity types it is configured for. This is covered in detail in the [Connectors and EHR Integration](/data/connectors-and-ehr.md) page.

## Open Schema

Traditional healthcare systems force data into fixed schemas - FHIR resources, HL7 segments, proprietary EHR tables. If information does not fit a predefined category, it gets shoehorned, truncated, or dropped. The schema is a constraint on what the system can know.

The world model inverts this. Entity types and event types are free-form text, not fixed enums. If the system encounters a new kind of entity or observation that does not fit existing categories, it creates a new type without requiring a database migration or schema change. Data arrives in any form, and the system structures it. The schema is not a constraint - it is an output.

This means agents discover structure rather than being limited by it. A conversation might reveal a relationship between a patient and a caregiver that no predefined schema anticipated. The world model accommodates it immediately. Over time, patterns in these emergent types can be formalized, but they are never blocked from entering the system in the first place.

## Entity Ontology

Entity types use ontological categories rather than domain-specific roles. A `person` entity can be a patient, a practitioner, or both - the projection function detects roles from the underlying event data rather than requiring a type declaration up front.

Person projection is role-aware. It examines the FHIR resource types on an entity's events to detect which roles the person fills and produces output that includes the relevant sections for each role. A person entity with patient events gets demographics and clinical sections. One with practitioner events gets a profile section. A merged entity with both gets all sections plus a roles list. This means a single person entity can represent the same individual across clinical and operational contexts.

### FHIR-Sourced Entity Attributes

When data arrives through EHR connectors that support FHIR, the projection pipeline surfaces a rich set of fields as directly queryable entity attributes. This means agent tools can filter, search, and reason over these fields without parsing raw FHIR resources.

| Entity Type      | Projected Fields                                                                                                                                                                                                                                                                                              |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Patient**      | Full address (line, city, state, postal code, country), preferred language, marital status, race, ethnicity, emergency contact (name, phone, relationship), facility assignment, primary insurance (payer name, practice payer ID, member ID, policyholder name, policyholder relationship, policyholder DOB) |
| **Appointment**  | Participants (provider, patient, location references and display names), appointment type, modality, duration, timezone, cancellation reason                                                                                                                                                                  |
| **Slot**         | Provider, facility, specialty, visit type IDs and names                                                                                                                                                                                                                                                       |
| **Practitioner** | Member ID, scheduling eligibility                                                                                                                                                                                                                                                                             |
| **Location**     | Full address, timezone, facility identifier, geographic coordinates                                                                                                                                                                                                                                           |

State and territory values are normalized to standard two-letter abbreviations (e.g., "Virginia" becomes "VA"), so agent queries and licensed-state filters work consistently regardless of how the source system formats addresses.

These attributes participate in the same confidence-scored projection as all other entity data. When the same field arrives from multiple sources, the standard resolution rules apply - higher confidence wins, with recency as the tiebreaker within the same confidence class.

Other built-in entity types - place, organization, outbound task, call, and encounter - each have dedicated projection logic. Custom entity types can define their own extraction rules declaratively (which fields to extract, how to resolve multiple values) without requiring code changes.

## Entity Enrichment

Entities rarely arrive with every attribute an operator, connector, or agent would want to track on them. A practice might want to record a patient's preferred language, a risk tier, a preferred communication channel, a consent flag, or a dozen other workspace-specific fields that no FHIR resource or predefined schema covers. Enrichment is how those attributes are attached to an entity without giving up the confidence model that makes the rest of the world model trustworthy.

Enrichment is not a schema extension or a bag of extra columns. Each enrichment value is a per-key event, written through the same path as every other world model event, carrying the same provenance fields (source, source system, confidence, effective time) and participating in the same supersedes chain. A patient's preferred language set by a human operator, set by a CRM sync, and extracted from a call transcript all coexist as separate events - the projection picks the current winner by confidence class, with recency as the tiebreaker, exactly like any other conflicting fact in the world model.

### Registry-Governed Keys

What separates enrichment from a free-for-all custom-fields bag is the registry. Each workspace declares the keys it tracks up front - for a given entity type, what values are allowed, what type they must be, what the minimum write confidence is, whether the value is PII. Supported value types cover scalar cases (string, number, boolean, date) and richer ones (enum, JSON).

Writes against unregistered keys are rejected at the API boundary. Agent-extracted values whose key is not registered are silently dropped - they never enter the event stream or compete with governed values. This gives admins full control over which attributes are tracked without needing to gate capture at each source; uncontrolled agent enthusiasm cannot pollute the schema.

The key identifier and value type are immutable after registration. To change them, admins create a new key and migrate. Unregistering a key stops future writes but preserves all historical `entity.enriched` events - the audit trail is never rewritten.

### Why This Matters

The same pattern handles manual admin edits, backfills from connectors like a CRM, structured capture from intake forms, and agent-extracted values from conversations. Every source writes through the same endpoint and is differentiated only by its `source` tag and confidence. A verified CRM sync outranks a conversational extraction; a human correction outranks both. The winner is always explainable - any value the system is currently serving can be traced back to the exact event, timestamp, and confidence that produced it.

This turns "custom attributes" from a second-class data path into first-class world model state: source-attributed, confidence-resolved, fully audited, and governed.

## Semantic Entity Search

Entities can be searched by meaning, not just by exact field values. Each entity has a vector embedding that encodes its full projected state - demographics, clinical data, relationships, and other attributes. Searching for "elderly patient with diabetes on insulin" returns relevant matches even when the entity records use different phrasing or structure.

Semantic search composes with traditional filters. You can combine a semantic query with entity type filters, source filters, confidence thresholds, and tag-based filtering. Tags are free-form labels on entities that support array overlap matching - filtering by tags returns any entity that has at least one of the specified tags.

Embeddings are generated in the background after entity state changes. The embedding encodes the full projected state, so it stays current as new events arrive and the projection updates.

## Write Semantics

Every write operation inserts an event atomically. Entity state is recomputed asynchronously by a background projection process that runs on a short interval. This decoupling means event writes are fast (no blocking on full entity recomputation), and projection runs independently at its own cadence. Key capabilities include:

* **Single event writes** with asynchronous entity recomputation
* **Idempotent entity creation** that safely handles duplicates - creating the same entity twice does not produce an error or a duplicate record
* **Source-scoped upserts** that create, update, or skip based on whether the source has already written this event
* **Retroactive entity linking** for events that arrived before entity resolution completed
* **On-demand recomputation** of entity state without writing new events

## Write Scope Isolation

Agent writes are constrained by a write scope - a permission boundary that limits what entities and event types the agent can modify during an interaction (voice or text). The write scope is configured per workspace and defines which entity types the agent can write to, which event types it can create, and what confidence level its writes carry.

System services (connector system, Platform API) are trusted and bypass this restriction. They handle verified, confidence-gated data and operate outside the write scope.

This prevents the agent from accidentally overwriting authoritative data with lower-confidence conversation-extracted data. A patient's verified insurance information from a direct integration cannot be replaced by something mentioned during a phone call - the write scope blocks it at the data layer, not at the API level. Every write from an agent context passes through the same permission check regardless of code path or modality.

{% hint style="info" %}
Write scope isolation is one of the platform's structural safety controls. For how this fits into the broader safety architecture, see [Runtime Safety](/safety-and-compliance/runtime-safety.md).
{% endhint %}

## Direct Agent Access via Platform Functions

Beyond the three data channels, agents can query world model data directly using [platform functions](/agent/platform-functions.md). These are SQL, AI, Python, and table-valued functions that run on the platform's compute layer and return results mid-conversation. Unlike the ambient channel (pre-loaded context) or the queried channel (built-in tool calls), platform functions can join live entity data with analytical aggregations in a single call.

Built-in platform functions cover common patterns: entity confidence assessment (how trustworthy is the data for this patient?), caller history lookup (what happened in prior calls with this number?), and patient summary briefings. For the long tail of questions no pre-built function anticipated, a workspace registers parameterized data queries (`wsq_<name>`) that run against its own custom tables. Platform functions are read-only; recording new observations as world model events is done through the dedicated write tools, which follow the atomic write path and write scope enforcement described above.

For the full platform functions reference, see [Platform Functions](/agent/platform-functions.md).

{% hint style="info" %}
**Developer Guide** - For API endpoints, SDK examples, and integration details, see the [Data & World Model](https://docs.amigo.ai/developer-guide/platform-api/platform-api/data-and-world-model) in the developer guide.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/data/world-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.