How It Works
Walk through a complete call lifecycle from phone ring to post-call analysis, mapping each step to real system components.
This page walks through a complete call lifecycle, from the moment a patient dials the number to post-call analysis. Every step maps to a real system component.
Call Lifecycle
Phase by Phase
1. Instant Greeting (During Ring Time)
When a call comes in, the system does not wait for the patient to pick up. During the ring time, it:
Creates a conference call with an agent leg already connected
Resolves the caller's identity from their phone number against the world model
Loads patient context (demographics, upcoming appointments, recent encounters) into the agent's system prompt
Loads the context graph (the state machine that defines the call flow)
By the time the patient says "hello," the agent is fully loaded and responds immediately. This conference-first architecture means there is never dead air at the start of a call.
2. Parallel Audio Processing
Every audio frame from the caller is processed by two independent systems simultaneously:
Speech-to-Text converts audio to text with sub-300ms latency. Domain-specific vocabulary boosting (medical terms, local provider names, insurance plans) improves recognition accuracy. End-of-turn detection determines when the caller has finished speaking.
Emotion Detection analyzes vocal prosody, burst patterns, and language content through three concurrent models. Results feed into a rolling 30-second window that tracks emotional state across the conversation, weighted toward recent signals.
These two streams never block each other. If emotion detection fails, speech processing continues unaffected.
3. Context Graph Navigation
The context graph is a hierarchical state machine that defines what the agent should accomplish at each point in the call. A navigation LLM evaluates the current transcript, emotional state, and conversation history to select the next action.
This is not a fixed script. The context graph defines goals and constraints. The agent determines how to achieve them based on the live conversation. If a patient brings up insurance while the agent is in a scheduling flow, the state machine can handle the transition.
The navigation step also selects filler phrases ("Let me check that for you") that keep the conversation flowing while the system processes the next response.
4. Tool Execution
External healthcare systems vary wildly in capability and reliability. The five-tier system decouples the patient experience from the limitations of whatever external system the agent needs to talk to.
External healthcare systems have wildly different capabilities. Some have fast FHIR APIs. Some have slow REST endpoints with rate limits. Some have no API at all - just a web portal that requires a browser login. Some crash under load. Some reject writes silently.
The five-tier tool execution system exists to decouple the agent's ability to help the patient from the limitations of the external system it needs to talk to:
T1 Direct
Under 2s
Patient lookup, slot search
Fast API available - no reason to make the patient wait
T2 Orchestrated
2-30s
Appointment booking, insurance check
Multi-step workflow against a responsive API - filler speech covers the wait
T3 Autonomous
30s-5min
Prior authorization, referral processing
Slow external system or complex multi-system workflow - runs in background
T4 Browser
1-10min
Portal login, form submission
No API exists - browser automation navigates the EHR portal directly
T5 Approval-gated
Variable
Prescription changes, clinical orders
High-stakes action that requires human sign-off before execution
Higher tiers keep the caller informed with natural status updates rather than silence. The world model absorbs throughput mismatches: if the agent needs to book an appointment but the EHR can't handle the write immediately, the intent is captured as an event and the connector runner delivers it when the external system is ready.
5. Response Generation and TTS
The response LLM generates the agent's reply using the full context: patient data, conversation history, tool results, emotional state, and the current context graph action. Emotion detection results directly influence the response through micro-behaviors (pacing, word choice, acknowledgment phrases).
Text-to-speech converts the response to audio with emotion-adaptive delivery. If the caller sounds rushed, the agent speeds up. If they sound confused, it slows down and simplifies. Word-level timestamps enable precise barge-in detection so the agent stops speaking when the caller starts.
6. Operator Escalation
When a situation exceeds the agent's scope (clinical judgment calls, upset callers requesting a human, safety triggers), the system escalates to a human operator.
The architecture is conference-first: the patient, agent, and operator are all in the same conference call. The operator joins the existing call rather than receiving a transfer. This means:
No dropped calls during handoff
The operator hears the agent's context summary before taking over
The agent can remain on the line to assist the operator with lookups
The transition is uninterrupted from the patient's perspective
7. Post-Call Processing
Data captured during a live phone call is inherently uncertain. The verification pipeline catches errors before they reach a system of record.
This pipeline exists because data captured during a live phone call is inherently uncertain. A patient might misspeak, the STT might mishear, or the agent might misinterpret. The verification pipeline catches these errors before they reach a system of record.
After the call ends:
Clinical events written during the call (at pending confidence) enter the automated review pipeline
Call classifier filters junk calls before clinical review runs
Per-event LLM judge cross-references extracted data against the transcript
Session coherence check validates narrative consistency across all events from the call
Verified events (confidence 0.7+) become eligible for EHR sync
Flagged items route to the operator review queue for human decision
Post-call summary is generated and stored as an event on the call entity
The connector runner handles the final step: syncing verified data back to the EHR through the appropriate adapter.
Last updated
Was this helpful?

