diagram-projectHow It Works

Walk through a complete call lifecycle from phone ring to post-call analysis, mapping each step to real system components.

This page walks through a complete call lifecycle, from the moment a patient dials the number to post-call analysis. Every step maps to a real system component.

Call Lifecycle

spinner

Phase by Phase

1. Instant Greeting (During Ring Time)

When a call comes in, the system does not wait for the patient to pick up. During the ring time, it:

  • Creates a conference call with an agent leg already connected

  • Resolves the caller's identity from their phone number against the world model

  • Loads patient context (demographics, upcoming appointments, recent encounters) into the agent's system prompt

  • Loads the context graph (the state machine that defines the call flow)

By the time the patient says "hello," the agent is fully loaded and responds immediately. This conference-first architecture means there is never dead air at the start of a call.

2. Parallel Audio Processing

Every audio frame from the caller is processed by two independent systems simultaneously:

Speech-to-Text converts audio to text with sub-300ms latency. Domain-specific vocabulary boosting (medical terms, local provider names, insurance plans) improves recognition accuracy. End-of-turn detection determines when the caller has finished speaking.

Emotion Detection analyzes vocal prosody, burst patterns, and language content through three concurrent models. Results feed into a rolling 30-second window that tracks emotional state across the conversation, weighted toward recent signals.

These two streams never block each other. If emotion detection fails, speech processing continues unaffected.

3. Context Graph Navigation

The context graph is a hierarchical state machine that defines what the agent should accomplish at each point in the call. A navigation LLM evaluates the current transcript, emotional state, and conversation history to select the next action.

This is not a fixed script. The context graph defines goals and constraints. The agent determines how to achieve them based on the live conversation. If a patient brings up insurance while the agent is in a scheduling flow, the state machine can handle the transition.

The navigation step also selects filler phrases ("Let me check that for you") that keep the conversation flowing while the system processes the next response.

4. Tool Execution

circle-exclamation

External healthcare systems have wildly different capabilities. Some have fast FHIR APIs. Some have slow REST endpoints with rate limits. Some have no API at all - just a web portal that requires a browser login. Some crash under load. Some reject writes silently.

The five-tier tool execution system exists to decouple the agent's ability to help the patient from the limitations of the external system it needs to talk to:

Tier
Latency
Example
Why This Tier Exists

T1 Direct

Under 2s

Patient lookup, slot search

Fast API available - no reason to make the patient wait

T2 Orchestrated

2-30s

Appointment booking, insurance check

Multi-step workflow against a responsive API - filler speech covers the wait

T3 Autonomous

30s-5min

Prior authorization, referral processing

Slow external system or complex multi-system workflow - runs in background

T4 Browser

1-10min

Portal login, form submission

No API exists - browser automation navigates the EHR portal directly

T5 Approval-gated

Variable

Prescription changes, clinical orders

High-stakes action that requires human sign-off before execution

Higher tiers keep the caller informed with natural status updates rather than silence. The world model absorbs throughput mismatches: if the agent needs to book an appointment but the EHR can't handle the write immediately, the intent is captured as an event and the connector runner delivers it when the external system is ready.

5. Response Generation and TTS

The response LLM generates the agent's reply using the full context: patient data, conversation history, tool results, emotional state, and the current context graph action. Emotion detection results directly influence the response through micro-behaviors (pacing, word choice, acknowledgment phrases).

Text-to-speech converts the response to audio with emotion-adaptive delivery. If the caller sounds rushed, the agent speeds up. If they sound confused, it slows down and simplifies. Word-level timestamps enable precise barge-in detection so the agent stops speaking when the caller starts.

6. Operator Escalation

When a situation exceeds the agent's scope (clinical judgment calls, upset callers requesting a human, safety triggers), the system escalates to a human operator.

The architecture is conference-first: the patient, agent, and operator are all in the same conference call. The operator joins the existing call rather than receiving a transfer. This means:

  • No dropped calls during handoff

  • The operator hears the agent's context summary before taking over

  • The agent can remain on the line to assist the operator with lookups

  • The transition is uninterrupted from the patient's perspective

7. Post-Call Processing

circle-info

Data captured during a live phone call is inherently uncertain. The verification pipeline catches errors before they reach a system of record.

This pipeline exists because data captured during a live phone call is inherently uncertain. A patient might misspeak, the STT might mishear, or the agent might misinterpret. The verification pipeline catches these errors before they reach a system of record.

After the call ends:

  1. Clinical events written during the call (at pending confidence) enter the automated review pipeline

  2. Call classifier filters junk calls before clinical review runs

  3. Per-event LLM judge cross-references extracted data against the transcript

  4. Session coherence check validates narrative consistency across all events from the call

  5. Verified events (confidence 0.7+) become eligible for EHR sync

  6. Flagged items route to the operator review queue for human decision

  7. Post-call summary is generated and stored as an event on the call entity

The connector runner handles the final step: syncing verified data back to the EHR through the appropriate adapter.

Last updated

Was this helpful?