# Call Intelligence and Analytics

Every interaction - voice and text - produces a structured analytical breakdown covering emotion, risk, latency, safety, and outcome quality. All computed automatically. The result is a dataset that covers your entire operation, not just the calls someone happened to listen to.

## Two Layers of Quality Analysis

{% @mermaid/diagram content="flowchart LR
I\[Live Interaction] --> L1\[Layer 1: Real-Time\n7 structured profiles]
I --> L2\[Layer 2: Post-Interaction\n5 quality dimensions]
L1 --> QS\[Composite Quality\nScore 0-100]
L2 --> QS
QS --> D\[Dashboards +\nTrend Analysis]
QS --> A\[Alerts +\nThresholds]" %}

Analysis happens in two passes. The first runs during the interaction. The second runs after it ends.

### Layer 1: Real-Time Intelligence

Seven structured profiles are computed while the interaction is still in progress.

| Profile                   | What It Captures                                                                                                        |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| **Emotion**               | Dominant emotion, valence and arousal averages, peak negative moment, emotional shifts over time, final emotional trend |
| **Risk**                  | Composite risk score with contributing signals identified                                                               |
| **Latency**               | Response time averages and percentiles, time-to-first-response, silence ratio                                           |
| **Conversation dynamics** | Turn count, states visited, loop count, interruption count, completion reason                                           |
| **Tool performance**      | Success and failure counts per tool invoked during the interaction                                                      |
| **Safety**                | Rule matches and escalation triggers fired                                                                              |
| **Operator involvement**  | Whether a human connected, time to connect, and resolution outcome                                                      |

These profiles are available before the interaction ends. Monitoring dashboards and alerting rules can act on them in real time.

### Layer 2: Post-Interaction Quality Scoring

After the interaction ends, a second pass scores quality across five dimensions on a 1-5 scale.

| Dimension                | What It Measures                                                       |
| ------------------------ | ---------------------------------------------------------------------- |
| **Task completion**      | Did the agent accomplish what the caller needed?                       |
| **Information accuracy** | Were facts correct? Did the agent act on accurate data?                |
| **Conversation flow**    | Natural pacing, no awkward pauses or repetitions                       |
| **Error recovery**       | Did the agent recover gracefully from confusion or unexpected input?   |
| **Caller experience**    | Overall experience based on tone, engagement, and interaction patterns |

Each interaction also receives an **outcome classification**: succeeded, partially succeeded, failed, or abandoned.

## Composite Quality Score

The five dimension scores feed into a single composite quality score on a 0-100 scale. The score starts at 100 and deducts for specific quality signals: high latency, excessive silence, interruptions, agent loops, escalations, and tool failures.

Interactions are tiered based on this score:

| Tier          | Meaning                                                        |
| ------------- | -------------------------------------------------------------- |
| **Excellent** | No significant quality issues detected                         |
| **Good**      | Minor issues that did not affect the outcome                   |
| **Fair**      | Noticeable issues that may have affected the caller experience |
| **Poor**      | Significant issues requiring review                            |

The composite score is the primary metric for tracking quality over time and comparing performance across agents, configurations, and time periods. It is designed for dashboard filtering and trend analysis - you can filter calls by tier to focus review time on the interactions that need it.

## Key Moment Extraction

The system automatically identifies notable events - moments of elevated risk, emotional shifts, escalation triggers, tool failures - and tags them with timestamps.

Reviewers jump directly to what matters instead of listening to entire recordings. When a call scores poorly, the key moments tell you exactly where things went wrong.

## Transcription Accuracy Feedback

Quality analysis feeds corrections back into transcription. When the scoring pass identifies likely transcription errors - a medical term misheard, a name consistently misspelled - it updates the speech recognition configuration.

Transcription accuracy improves over time for your specific vocabulary: medical terminology, provider names, local street names, insurance plan names. No manual tuning required.

## Quality Trends

Individual call scores are useful for reviewing specific interactions. Trends across thousands of calls are where you get operational visibility.

Analytics show quality score distribution, escalation rates, and per-component breakdowns over configurable date ranges. Period-over-period comparison lets you measure whether a configuration change actually improved quality or made things worse.

This is the feedback loop that drives continuous improvement. You make a change to the agent configuration. You wait for enough calls to accumulate. You compare the quality distribution before and after. The data tells you whether the change helped, hurt, or had no measurable effect. Without this, configuration changes are guesswork.

## Analytics

Beyond per-call intelligence, the platform provides workspace-level analytics covering call quality, data quality, pipeline health, and entity resolution. These metrics give operations teams visibility into how data flows through the system and where attention is needed.

All analytics support date range filtering, time bucketing (hourly, daily, weekly), and optional service-level filtering. Results power the developer console dashboards and are available to any user with read access to the workspace.

### Call Quality Trends

Analytics aggregate call intelligence data across all completed calls in a workspace. All support date range filtering, time bucketing, and optional service-level filtering.

| View                     | What It Shows                                                                                        |
| ------------------------ | ---------------------------------------------------------------------------------------------------- |
| **Call quality**         | Quality score trends (avg, p50, p95), distribution by tier, escalation rate, call volume             |
| **Emotion trends**       | Dominant emotion distribution across calls, valence/arousal trends over time, per-emotion frequency  |
| **Safety trends**        | Escalation frequency over time, risk level distribution, safety rule match counts                    |
| **Latency**              | p50/p95/p99 latency by component (engine response, audio time-to-first-byte, navigation, render)     |
| **Tool performance**     | Per-tool success/failure rates, failure trends, invocation counts and average duration               |
| **Operator performance** | Escalation rate trends, quality comparison (escalated vs non-escalated calls), operator connect time |

Advanced analytics support percentile breakdowns (p50/p95/p99) for duration and quality scores, time series trends with p95 latency, and breakdowns by service and call direction (inbound/outbound).

Period-over-period comparison lets you pick any two date ranges and see absolute and percentage change for each KPI. When you update an agent's context graph, change a prompt, or modify an escalation rule, you can compare the week before and after to see exactly what changed. This is the simplest way to answer "did that change help?" with data instead of intuition.

### Call Intelligence Profiles

Each voice call also produces a structured intelligence summary computed from session state at call end. These summaries capture operational telemetry that async quality scoring (which runs on recordings) cannot see: real-time emotion trajectories, engine response latency, tool invocation counts, and safety rule matches. The composite quality score (0-100) from this summary uses the same penalty-based model described above and is the primary metric for dashboard filtering.

### Call Statistics

For voice deployments, call analytics track volume, duration, and daily patterns. These metrics help operations teams identify capacity trends (peak calling hours, seasonal volume changes) and spot anomalies (sudden drops in call volume that might indicate a routing issue).

* **Call volume** - Total calls over configurable windows (30-90 days), broken down by service and direction (inbound/outbound)
* **Duration distribution** - How long calls last, useful for identifying calls that are too short (abandoned) or too long (stuck in loops)
* **Daily breakdown** - Per-day call counts for trend analysis
* **Service breakdown** - Volume by agent service, so you can compare usage across scheduling, care coordination, and other workflows

### Data Quality Dashboard

The data quality dashboard tracks confidence distribution across all events in the workspace. Every piece of data that enters the world model carries a confidence score, and the dashboard shows how that distribution looks across your entire dataset:

| Bucket             | Confidence Range | What It Means                                                  |
| ------------------ | ---------------- | -------------------------------------------------------------- |
| **Rejected**       | 0.0              | Events that failed review or were explicitly contradicted      |
| **Raw**            | 0.1-0.3          | Unverified data from agent inference or initial extraction     |
| **Uncertain**      | 0.4-0.5          | Voice-extracted data awaiting review                           |
| **Verified**       | 0.6-0.7          | Data that passed automated review                              |
| **Human-approved** | 0.8-0.95         | Data approved by a human reviewer                              |
| **Authoritative**  | 1.0              | Data from authoritative system integrations (direct EHR feeds) |

The dashboard also shows confidence breakdown by data source, so you can see which sources produce the most reliable data and which generate the most review queue items. A daily confidence timeseries shows low-confidence and high-confidence event counts over time, making it easy to spot trends after configuration changes.

Review pipeline metrics track how the automated and human review stages are performing:

* **Auto-approved** - Events that passed automated review without human involvement
* **Auto-verified** - Events verified by the automated review judge
* **Rejected** - Events that failed review
* **Pending review** - Events waiting in the human review queue
* **Human-approved** - Events approved by an operator
* **Corrected** - Events where an operator provided corrected data
* **Review rate** - Percentage of events that required any form of review

### Pipeline Health

Pipeline health metrics provide real-time visibility into the [connector runner's](https://docs.amigo.ai/data/connectors-and-ehr) operational state. These are the metrics you check when something feels wrong - data is stale, surfaces are not getting filled, or outbound sync is backed up.

* **Overall status** - Healthy, degraded, or starting - with active poll count and total event/entity counts
* **Per-source connection health** - Whether each data source is reachable and polling successfully, with last poll time, duration, and event counts
* **Loop states** - Current state of each background process (entity resolution, review, outbound sync, reconciliation)
* **Outbound sync status** - Per-sink breakdown of synced, failed, and pending events
* **Throughput time series** - Event ingestion over time, bucketed by hour or day, filterable by source

Sources are automatically marked unhealthy after consecutive poll failures and recover when polling succeeds again. The dashboard degrades gracefully - if the connector runner is temporarily unavailable, stored metrics (event counts, sync history, review stats) remain available without live loop status.

Additional pipeline detail is available per source:

* **Entity resolution metrics** - Total merges, recent merge activity, and resolution loop status
* **Review pipeline metrics** - Queue depth, pending items by priority, approval/rejection counts, and average review time
* **Outbound sync detail** - Per-sink event counts, failure reasons, and retry status

### Command Center Dashboard

The command center provides a single-pane workspace health view that aggregates metrics from across the platform into four sections. This is the "is everything OK?" dashboard - one screen that tells an operations team whether voice calls are flowing, data pipelines are healthy, data quality is acceptable, and identity systems are functioning.

| Section          | Key Metrics                                                                                |
| ---------------- | ------------------------------------------------------------------------------------------ |
| **Voice**        | Active calls, escalated calls, calls today, average quality score, escalation rate         |
| **Pipeline**     | Source health counts (healthy/degraded/failing), events last hour, outbound pending/failed |
| **Data Quality** | Pending reviews, 7-day approval rate, average confidence, total entities, recent merges    |
| **Identity**     | Active API keys, active sessions, failed auth attempts, locked accounts, MFA coverage      |

Each section fails independently - if one data source is unavailable, the other sections still return results with a degraded indicator. The response includes a list of degraded sections so dashboards can show partial data with appropriate warnings.

Alerts are derived from threshold checks on the aggregated metrics: escalation rate above threshold, failing data sources, low approval rate, high outbound failure count. This gives operations teams a single view to answer "is anything broken right now?" without drilling into individual dashboards.

## Entity Intelligence

The [world model](https://docs.amigo.ai/data/world-model) stores entities (patients, providers, appointments, medications) and their relationships. Four capabilities provide visibility into this entity data:

| Capability              | What It Shows                                                                                       |
| ----------------------- | --------------------------------------------------------------------------------------------------- |
| **Relationship graph**  | One-level graph of all edges (same\_as, related\_to) from an entity, with connected entity metadata |
| **Data provenance**     | Full lineage for an entity - contributing data sources, confidence history, merge events            |
| **Duplicate detection** | Suspected duplicates sorted by confidence, filterable by entity type                                |
| **Entity search**       | Search entities by name with filters for type, source, and minimum confidence                       |

These tools let operations teams audit how the world model arrived at a particular entity state and catch duplicate records before they cause downstream issues.

### Relationship Graph

The relationship graph shows one level of connections from any entity. Each edge carries a relationship type (same\_as for duplicates, related\_to for associations like patient-to-provider) and the connected entity's metadata. This is useful for understanding how entities relate to each other and for verifying that entity resolution has correctly linked records.

### Data Provenance

Data provenance traces the full lineage of an entity: which data sources contributed, how confidence changed over time, and which merge events combined records. When a patient record has conflicting information (two different phone numbers from two different sources), provenance shows exactly where each value came from and why the current value was chosen.

### Duplicate Detection

Duplicate detection surfaces suspected duplicate entities - records that the entity resolution system has flagged as likely referring to the same real-world person or object. Results are sorted by confidence and filterable by entity type, so operations teams can prioritize high-confidence duplicates for review first.

Entity search lets you find entities by name with filters for type, source, and minimum confidence. This is the starting point for most investigations - find the entity, then use the relationship graph and provenance tools to understand its state.

## Surface Analytics

[Surfaces](https://docs.amigo.ai/channels/surfaces) are agent-generated data collection forms delivered to patients via SMS, WhatsApp, email, or web. Four analytics views provide closed-loop intelligence on how surfaces perform, enabling agents and gap scanners to optimize surface design based on actual outcomes:

| Metric                    | What It Shows                                                                                                                                                           |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Completion rates**      | Overall completion rate, trend over time, and breakdown by source (mid-call agent, gap scanner, manual). Identifies whether surfaces are actually being filled out.     |
| **Channel effectiveness** | Per-channel (SMS, email, WhatsApp, web, voice) completion rate and average time-to-complete. Shows which delivery method works best for your patient population.        |
| **Field abandonment**     | Which specific fields cause patients to stop filling out a surface. Drop-off rate and save rate per field, so you can identify confusing or unnecessary fields.         |
| **Per-entity history**    | Surface history for a specific patient: completion stats, preferred channel, and recent surfaces. Useful for choosing the right channel and avoiding over-solicitation. |

All surface analytics support date range filtering (default 30 days, max 90). The field abandonment data is particularly actionable - if 40% of patients drop off at a specific question, that question is either confusing, unnecessary, or too sensitive for the delivery channel. Removing or rewording it directly improves completion rates.

## Event Distribution

Event analytics show how data enters the system and what kinds of data are flowing. This is useful for two things: verifying that integrations are working (confirming that a connector is producing expected event volumes) and understanding the data mix in the workspace.

Two views are available:

* **By type** - Event counts per entity type (patient, appointment, practitioner, insurance, medication, etc.). Shows what the system knows about and highlights gaps - if you expect appointment data but see zero appointment events, something is misconfigured.
* **By source** - Event counts per data source (EHR sync, voice extraction, manual entry, surface submission, etc.). Shows where data is coming from and how the mix changes over time. A healthy workspace typically has authoritative EHR data as the largest source, with voice-extracted and surface-submitted data filling gaps.

These distributions are useful during initial integration setup (to verify connectors are producing data) and ongoing operations (to spot when a source goes silent).

## Real-Time Event Stream

A Server-Sent Events (SSE) endpoint streams workspace events in real time, powering live dashboards and notifications without polling.

Supported event types:

* `call.started`, `call.ended`, `call.escalated` - Call lifecycle events
* `pipeline.sync_completed`, `pipeline.error` - Connector runner events
* `review.submitted` - Review queue activity
* `alert` - Threshold-based alerts from the command center

The stream sends heartbeat comments every 30 seconds to keep the connection alive and supports `Last-Event-ID` for automatic reconnection with replay of missed events. This means dashboards can recover from network interruptions without losing data.

## Insights Agent

The Insights Agent is a conversational interface for exploring workspace data. Rather than navigating dashboards and filtering tables, operators ask questions in natural language and receive structured analysis with visualizations.

The agent streams responses in real time - reasoning steps, tool invocations, and data queries are visible as they execute, so operators can follow the analysis as it unfolds. When the agent queries the platform's analytics endpoints or metric store, the results appear inline as formatted tables and charts.

Typical queries:

* "How did call quality change after we updated the scheduling context graph last Tuesday?"
* "Which surface fields have the highest abandonment rates this month?"
* "Show me the calls that scored below 60 this week and what went wrong"
* "Compare escalation rates between our two clinic locations"

The Insights Agent uses the same tool infrastructure as the voice and text agents - it calls the workspace's analytics and metric store endpoints to answer questions, so results always reflect live data rather than cached summaries.

## Metric Store

For workspace-level operational metrics beyond the analytics described above - including built-in metrics, custom AI-powered metrics, per-metric latency tiers, and cross-channel analytics - see the [Metric Store](https://docs.amigo.ai/intelligence-and-analytics/metric-store). The metric store provides a catalog of pre-built and custom metrics that can be evaluated against any interaction, supporting both real-time scoring and batch analysis across your full interaction history.

{% hint style="info" %}
**Developer Guide** - For the full analytics endpoint set and metric store details, see [Analytics](https://docs.amigo.ai/developer-guide/platform-api/analytics) in the developer guide.
{% endhint %}
