> For the complete documentation index, see [llms.txt](https://docs.amigo.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.amigo.ai/reference/performance.md).

# Performance Characteristics

Measured performance numbers for the major platform subsystems. These are operational characteristics, not theoretical limits.

## Voice Pipeline Latency

The voice pipeline has three primary latency numbers:

* **First-message latency**: Sub-2s from call connect to start of agent greeting audio. Only the reasoning engine and TTS are on the critical path - speech-to-text and monitoring initialize in parallel with the greeting.
* **STT latency**: Sub-300ms from audio input to transcript text
* **Average response latency**: \~900ms from end of caller speech to start of agent audio

The 900ms gap is covered by [filler speech](/channels/voice/audio-pipeline.md) - the caller hears "Let me check on that" while the full response is being generated. [Prompt caching](/platform-overview/cost-and-latency.md) contributes to these latency numbers by keeping the static portion of the system prompt in cache across turns.

### Per-Turn Timing Breakdown

Each conversational turn passes through five layers:

| Layer                  | What Happens                                                            |
| ---------------------- | ----------------------------------------------------------------------- |
| **STT processing**     | Audio converted to transcript text                                      |
| **Engine**             | Context graph navigation, dynamic behavior evaluation, memory retrieval |
| **Render**             | LLM generates response text with emotional context                      |
| **TTS generation**     | Text converted to speech audio with emotion parameters                  |
| **Transport delivery** | Audio delivered to the telephony layer                                  |

## Concurrency

The voice agent scales horizontally to handle large concurrent call volumes. There is no shared state between instances that would limit horizontal scaling.

## Connector Runner

The connector runner ingests external data from configured sources and dispatches verified outbound writes. Reconciliation catches missed changes after transient failures.

## Emotion Detection

| Parameter                        | Value                                                                       |
| -------------------------------- | --------------------------------------------------------------------------- |
| **Prosody models**               | Dual-model (categorical + dimensional) running in parallel per segment      |
| **Audio segment size**           | 2 seconds                                                                   |
| **Rolling window**               | Short window over most recent segments (tuned for real-time mood tracking)  |
| **Speaker normalization warmup** | \~10 seconds (5 segments) before per-caller baselines are meaningful        |
| **Context fusion**               | Applied after each turn; adds no measurable latency to the emotion pipeline |
| **Empathy tier classification**  | Rule-based, <100ms (no LLM call)                                            |
| **Circuit breaker recovery**     | Automatic recovery after consecutive failures                               |
| **Audio buffer**                 | Non-blocking, drops on overflow to maintain real-time processing            |
| **Text buffer**                  | Non-blocking, drops on overflow to maintain real-time processing            |

## API Rate Limits

The Platform API enforces per-route rate limits to prevent abuse while supporting high-throughput automation workflows.

| Operation                    | Limit                   | Scope                  |
| ---------------------------- | ----------------------- | ---------------------- |
| **Outbound call creation**   | 1,000/min               | Per API key            |
| **Conversation creation**    | 60/min                  | Per API key            |
| **General write operations** | 10/min                  | Per route, per API key |
| **Read operations**          | 60/min                  | Per route, per API key |
| **Test caller numbers**      | Up to 100 per workspace | Per workspace          |

Outbound call creation has a higher limit than other write operations because campaign and outreach workflows fan out to hundreds of patients per run from a single API key. The rate limit key includes the route, so the outbound limit does not affect other write endpoints.

Read operations have a higher, more permissive limit than writes: 60 requests/min per route, per API key.

## End-of-Turn Detection

End-of-turn confidence thresholds are configurable per workspace. The thresholds balance responsiveness (responding quickly when the caller finishes) against interruption risk (cutting the caller off mid-sentence). Tuning depends on your patient population's speaking patterns.

## Post-Call Processing

After each call ends, the system runs batch re-transcription using a higher-accuracy model. This produces the canonical transcript that is used for data extraction, quality review, and compliance records. The re-transcription runs asynchronously and does not affect call latency.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.amigo.ai/reference/performance.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
