gauge-highPerformance Characteristics

Latency, throughput, and capacity numbers for voice pipeline, connector runner, and emotion detection.

Measured performance numbers for the major platform subsystems. These are operational characteristics, not theoretical limits.

Voice Pipeline Latency

The voice pipeline has two primary latency numbers:

  • STT latency: Sub-300ms from audio input to transcript text

  • Average response latency: ~900ms from end of caller speech to start of agent audio

The 900ms gap is covered by filler speech - the caller hears "Let me check on that" while the full response is being generated.

Per-Turn Timing Breakdown

Each conversational turn passes through five layers:

Layer
What Happens

STT processing

Audio converted to transcript text

Engine

Context graph navigation, dynamic behavior evaluation, memory retrieval

Render

LLM generates response text with emotional context

TTS generation

Text converted to speech audio with emotion parameters

Transport delivery

Audio delivered to the telephony layer

Concurrency

The voice agent supports up to 50 concurrent sessions per pod. Pods scale horizontally - add more pods to handle more concurrent calls. There is no shared state between pods that would limit horizontal scaling.

Connector Runner

Parameter
Value

Poll interval

10 seconds

Max concurrent data sources

10 per connector runner instance

Mutex TTL

1,800 seconds (30 minutes) - prevents duplicate processing for slow external APIs

Reconciliation interval

300 seconds (5 minutes)

Outbound dispatch interval

30 seconds

Emotion Detection

Parameter
Value

Audio segment size

2 seconds

Rolling window

30 seconds (~15 segments)

Circuit breaker recovery

10 seconds after 2 consecutive failures

Audio buffer

5 segments (non-blocking, drops on overflow)

Text buffer

20 segments (non-blocking, drops on overflow)

End-of-Turn Detection

End-of-turn confidence thresholds are configurable per workspace. The thresholds balance responsiveness (responding quickly when the caller finishes) against interruption risk (cutting the caller off mid-sentence). Tuning depends on your patient population's speaking patterns.

Post-Call Processing

After each call ends, the system runs batch re-transcription using a higher-accuracy model. This produces the canonical transcript that is used for data extraction, quality review, and compliance records. The re-transcription runs asynchronously and does not affect call latency.

Last updated

Was this helpful?