Memory
A hierarchical memory system that maintains enough context for critical enterprise decisions
Memory is workspace-scoped with configurable dimensions and extraction modes (static or LLM). Extracted facts are materialized into pre-computed tables for fast query access. The layered architecture (L0-L3) described below applies across all channels.
For Developers: See the REST API reference for endpoint details, request/response schemas, and SDK code examples.
The Memory Problem
Most memory systems either store everything (wasting resources) or use generic importance scoring (missing critical details). Neither approach works for high-stakes decisions.
Amigo's memory system keeps enough user context for decisions readily accessible during conversations. When a patient mentions chest tightness, the system surfaces their heart condition history, anxiety patterns, and medication context without waiting for ad-hoc retrieval. This minimizes latency while maintaining the information needed for real-time reasoning.
The result: healthcare decisions that properly account for how current symptoms connect to medical history, medication interactions, family patterns, and past treatment responses.
Layered Memory Architecture
Memory is organized in four layers. Each layer compresses what came before while preserving what matters for outcomes.
L0 - Raw Transcripts
Complete conversation records
Ground truth for historical review and source material for extraction
L1 - Extracted Memories
Net-new facts from each conversation
What changed since the last session, filtered by what L3 already knows
L2 - Episodic User Models
Patterns synthesized across multiple sessions
Recurring structures that emerge over weeks or months
L3 - Global User Model
Stable understanding merged across all time
Always in memory during live sessions - the agent reasons from this directly
L3 is the layer that matters most at runtime. It stays loaded in the agent's context during every session, so the agent never needs to retrieve patient history on the fly. When a patient mentions jaw pain, L3 already contains their cardiac history - jaw pain can be a non-traditional symptom of a heart attack, and the agent catches that connection immediately because the relevant context is already present.
Example: How cross-domain connections surface
A patient complains about jaw pain. A generic system searches for "jaw pain" and returns dental records. Amigo's memory works differently:
L3 already holds the patient's cardiac history, anxiety patterns, and medication list
The agent interprets jaw pain against this full context - recognizing it as a potential cardiac symptom
The agent's clinical identity shapes which connections matter most
Temporal patterns (is this new? recurring? correlated with stress?) are available without retrieval
The result: the agent asks about chest pressure and shortness of breath rather than suggesting a dentist visit.
Example: Dimensional framework configuration
Each dimension defines a category of information with precision requirements. This is what a dimension definition looks like:
Dimensions requiring "perfect" precision are those where errors directly affect clinical outcomes - medication interactions can be life-threatening, so medical history must never be forgotten or miscontextualized. Other dimensions like exercise preferences might only need periodic updates.
Key Features of Amigo's Memory System
1. Recent Information Guarantee
Amigo guarantees that recent information (last n sessions based on information decay for use case) is always available for:
Full reasoning over the complete context
Perfect recall of all details
Recontextualization based on new understanding
This addresses the information decay problem common in memory systems.
2. On-Demand Retrieval (Rare)
When the agent needs information not in L3 (rare, adds latency):
L3 covers most cases: L3 supplies the full patient context at session start, so on-demand retrieval is uncommon
Targeted queries: The agent retrieves only the specific missing data rather than running broad searches
Recontextualized Understanding: Past L0 conversations recontextualized against current L3 understanding, enabling reasoning beyond simple retrieval
Professional Context Filtering: Service provider background guides what constitutes meaningful gaps requiring historical expansion
Temporal Synthesis: L3 bridges live session context with historical L0 context through dual anchoring mechanism
3. Dimensional Evolution
Amigo adapts its memory dimensions as patient patterns emerge:
Core Capabilities:
Professional identity guiding interpretation at every level of the memory hierarchy
System evolution of attention patterns based on discovered patient patterns
Adaptive Dimensional Optimization: When the system detects drift between user dimension definitions and optimal interpretation patterns for a patient group, it can modify dimensional definitions and perform complete temporal backfill
Advanced features:
Replay-based reinterpretation: When dimensions change, the system can reprocess historical data through the updated definitions - regenerating L0 through L3 with the new framework
Cohort reinterpretation: Dimension changes apply across entire patient groups, not just individual patients
Feedback loops: L3 patterns inform how L0/L1 extraction works, so the system refines what it captures based on what turns out to matter
Concrete Example: Discovering Hidden Patterns
Consider a patient whose blood sugar control seems randomly unstable:
Week 1-4 (L1 extraction): System captures seemingly unrelated mentions-work deadlines Tuesday, feeling stressed Thursday, missed medication Friday. Each seems like noise.
Month 2-3 (L2 accumulation): Patterns emerge from accumulated L1 data. A 2-3 week cycle appears: work stress -> medication timing disruption -> blood sugar instability.
Quarter 1-3 (L3 cross-episode analysis): Comparing multiple quarterly episodes reveals this isn't random-it's a stable functional dimension. The stress-medication-timing interaction becomes part of L3's dimensional blueprint.
Result: What looked like random instability is actually a discoverable pattern. Now the system can proactively intervene when work stress patterns emerge, preventing blood sugar episodes.
This discovery was only possible because:
L1 captured seemingly irrelevant details (unfiltered extraction)
L2 aggregated over sufficient time for patterns to emerge (temporal aggregation)
L3 identified the pattern across multiple episodes (cross-episode analysis)
At population scale, only 10-50 such functional dimensions typically explain substantial outcome variance. The sparsity isn't imposed-it emerges as true causal patterns become visible while noise averages out.
4. Enterprise Customizability
Amigo's memory architecture is fully customizable for enterprise-specific needs. Six built-in dimensions ship as defaults - clinical, behavioral, operational, social, engagement, and risk - covering the most common healthcare memory needs. Workspaces can adjust these through the memory settings API:
Add custom dimensions for domain-specific memory categories (e.g., "financial" for billing workflows, "legal" for compliance-sensitive interactions)
Adjust weights to prioritize which dimensions receive more attention during memory extraction
Deactivate built-in dimensions that are not relevant for a particular use case
Choose extraction mode - Built-in dimensions use static extraction (fast, deterministic JSONPath matching). Custom dimensions default to LLM extraction, which uses the dimension's description as a semantic prompt to extract relevant facts from conversation events. LLM extraction handles nuanced, context-dependent information that static patterns would miss.
Memory dimension settings are managed through the Platform API and take effect on the next post-session processing run. No code changes or redeployment required. Memory analytics (coverage, fact counts, per-dimension breakdown) are available through the Developer Console's Agent Memory page.
For more complex customization, our Agent Engineers can work with you on a full implementation process:
Critical Function Assessment: Identify functions requiring near-perfect memory and map critical information types and hierarchy based on your use cases.
Memory Design: Configure memory topology and define user dimensions and parameters.
Integration & Deployment: Deploy memory system, connect to existing data sources and initialize user models.
Verification & Optimization: Validate functional performance, optimize dimensional parameters to increase performance where necessary.
How Recall Works
L3 handles the vast majority of recall needs. Because it is always loaded during a session, the agent has immediate access to the patient's full dimensional profile - conditions, medications, behavioral patterns, risk factors - without any retrieval step.
In rare cases (roughly 5-10% of interactions), the agent encounters something that requires deeper historical context than L3 alone provides. When that happens, the system runs a targeted retrieval against L0 transcripts, guided by what L3 already knows. This is not a broad search - L3 shapes the query so it targets the specific gap. The retrieved history is then interpreted through L3's current understanding, not in isolation.
Analytics Feedback Loop
Memory data feeds back into the system itself. Patterns discovered across patient populations - which dimensions predict adherence, which symptom clusters correlate with outcomes - refine the dimensional frameworks used for individual patients. When frameworks improve, the system can backfill historical data through the new lens, reinterpreting past conversations with better understanding. This is how individual interactions accumulate into organizational intelligence over time.
Memory as Safety Foundation
For text channel conversation persistence - including frozen conversation plans and turn history - see Text Sessions.
The memory system is a core part of Amigo's safety framework. Because L3 is always available, safety decisions always have access to complete context.
Crisis prevention - Past crisis indicators and risk factors remain immediately accessible
Medication safety - Complete medication history and adverse reactions guide pharmaceutical discussions
Risk awareness - Safety-relevant dimensions are tracked with the highest precision requirements
Historical context - Past events are understood through current clinical understanding, not in isolation
As detailed in Operational Safety, safety protections are built into the same reasoning process that drives all agent behavior rather than bolted on as separate filters.
Post-Session Processing Pipeline
After a conversation ends, a multi-stage asynchronous pipeline processes the raw interaction data and materializes memory for fast query access:
1. Memory Extraction
An LLM reviews the full conversation transcript and extracts structured facts. Each fact receives an importance score (1-10), message-level references pointing to the specific messages that support it, and a context field explaining why this fact matters. The extracted memories are embedded as vectors and stored for semantic retrieval.
2. Memory Materialization
Extracted facts are classified into the workspace's configured dimensions and materialized into pre-computed tables. Each fact gets a human-readable extracted_text summary (derived from clinical event structures like diagnoses, medications, and procedures) so downstream consumers don't need to parse raw event data. Dimension-level aggregates (fact counts, average confidence, source diversity) are also pre-computed. This materialization step is what makes the Platform API's memory query endpoints fast - reads serve pre-computed results rather than running extraction at query time.
3. Metric Evaluation
If the service has configured quality metrics, the system evaluates them in parallel. Metrics are chunked (20 per prompt) and evaluated concurrently. Each evaluation includes a score, justification, and references to the specific messages that informed the assessment.
4. User Model Generation
Memories are processed in batches of 100. For each batch, an LLM generates dimensional user model observations tagged with the organization's configured user dimensions. When multiple batches exist, a merge step synthesizes them into a single cohesive user model. Intermediate results are checkpointed to prevent rework on failure.
Each stage has deduplication guards to prevent re-processing if the pipeline is triggered twice for the same conversation.
Summary
In healthcare, memory that works "most of the time" is not good enough. Patient safety requires perfect recall of critical information, complete preservation of context across provider transitions, identification of information gaps before they affect care, and tracking of how patient understanding changes over time. Amigo achieves this by keeping L3 constantly available during sessions - the agent always has the patient's full context without retrieval delays.
Last updated
Was this helpful?

