# Performance Characteristics

Measured performance numbers for the major platform subsystems. These are operational characteristics, not theoretical limits.

## Voice Pipeline Latency

The voice pipeline has three primary latency numbers:

* **First-message latency**: Sub-2s from call connect to start of agent greeting audio. Only the reasoning engine and TTS are on the critical path - speech-to-text and monitoring initialize in parallel with the greeting.
* **STT latency**: Sub-300ms from audio input to transcript text
* **Average response latency**: \~900ms from end of caller speech to start of agent audio

The 900ms gap is covered by [filler speech](/channels/voice/audio-pipeline.md) - the caller hears "Let me check on that" while the full response is being generated. [Prompt caching](/platform-overview/cost-and-latency.md) contributes to these latency numbers by keeping the static portion of the system prompt in cache across turns.

### Per-Turn Timing Breakdown

Each conversational turn passes through five layers:

| Layer                  | What Happens                                                            |
| ---------------------- | ----------------------------------------------------------------------- |
| **STT processing**     | Audio converted to transcript text                                      |
| **Engine**             | Context graph navigation, dynamic behavior evaluation, memory retrieval |
| **Render**             | LLM generates response text with emotional context                      |
| **TTS generation**     | Text converted to speech audio with emotion parameters                  |
| **Transport delivery** | Audio delivered to the telephony layer                                  |

## Concurrency

The voice agent scales horizontally to handle large concurrent call volumes. There is no shared state between instances that would limit horizontal scaling.

## Connector Runner

The connector runner polls external data sources at configurable intervals and dispatches outbound writes in near-real-time. Each instance handles multiple concurrent data sources with coordination to prevent duplicate processing. Reconciliation runs periodically to catch any missed changes.

## Emotion Detection

| Parameter                        | Value                                                                       |
| -------------------------------- | --------------------------------------------------------------------------- |
| **Prosody models**               | Dual-model (categorical + dimensional) running in parallel per segment      |
| **Audio segment size**           | 2 seconds                                                                   |
| **Rolling window**               | Short window over most recent segments (tuned for real-time mood tracking)  |
| **Speaker normalization warmup** | \~10 seconds (5 segments) before per-caller baselines are meaningful        |
| **Context fusion**               | Applied after each turn; adds no measurable latency to the emotion pipeline |
| **Empathy tier classification**  | Rule-based, <100ms (no LLM call)                                            |
| **Circuit breaker recovery**     | Automatic recovery after consecutive failures                               |
| **Audio buffer**                 | Non-blocking, drops on overflow to maintain real-time processing            |
| **Text buffer**                  | Non-blocking, drops on overflow to maintain real-time processing            |

## API Rate Limits

The Platform API enforces per-route rate limits to prevent abuse while supporting high-throughput automation workflows.

| Operation                    | Limit                   | Scope                  |
| ---------------------------- | ----------------------- | ---------------------- |
| **Outbound call creation**   | 1,000/min               | Per API key            |
| **Conversation creation**    | 60/min                  | Per API key            |
| **General write operations** | 10/min                  | Per route, per API key |
| **Test caller numbers**      | Up to 100 per workspace | Per workspace          |

Outbound call creation has a higher limit than other write operations because campaign and outreach workflows fan out to hundreds of patients per run from a single API key. The rate limit key includes the route, so the outbound limit does not affect other write endpoints.

Read operations are not rate-limited at the API gateway layer.

## End-of-Turn Detection

End-of-turn confidence thresholds are configurable per workspace. The thresholds balance responsiveness (responding quickly when the caller finishes) against interruption risk (cutting the caller off mid-sentence). Tuning depends on your patient population's speaking patterns.

## Post-Call Processing

After each call ends, the system runs batch re-transcription using a higher-accuracy model. This produces the canonical transcript that is used for data extraction, quality review, and compliance records. The re-transcription runs asynchronously and does not affect call latency.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/reference/performance.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.