> For the complete documentation index, see [llms.txt](https://docs.amigo.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations/conversations-voice.md).

# Voice

Amigo supports two voice modes. Choose the one that matches your UX and latency needs:

| Mode                            | Transport     | Best for                                        | Latency       | Notes                                          |
| ------------------------------- | ------------- | ----------------------------------------------- | ------------- | ---------------------------------------------- |
| **Voice Notes (HTTP)**          | HTTP + NDJSON | Asynchronous push-to-talk, in-app voice replies | Low-to-medium | Upload a short clip; receive streamed TTS back |
| **Real-time Voice (WebSocket)** | WebSocket     | Natural, full-duplex conversations              | Very low      | Bidirectional audio with VAD and interruption  |

See real-time details in Real-time Voice (WebSocket) (conversations-realtime.md).

{% hint style="warning" %}
**Phone-based voice**: these endpoints are for text-channel voice (push-to-talk notes and WebSocket streaming). For enterprise phone-based inbound and outbound calls with emotion detection and EHR integration, see [Platform API: Voice Agent](/developer-guide/platform-api/platform-api/voice-agent.md).
{% endhint %}

## Voice Mode Comparison

{% @mermaid/diagram content="%%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 30, "rankSpacing": 40}, "theme": "base", "themeVariables": {"primaryColor": "#D4E2E7", "primaryTextColor": "#100F0F", "primaryBorderColor": "#083241", "lineColor": "#575452", "textColor": "#100F0F", "clusterBkg": "#F1EAE7", "clusterBorder": "#D7D2D0"}}}%%
flowchart TB
Start{Choose Voice Mode}

```
Start -->|Asynchronous<br/>Push-to-talk| HTTP[Voice Notes - HTTP]
Start -->|Real-time<br/>Full-duplex| WS[Real-time Voice - WebSocket]

subgraph HTTP_Flow["HTTP Voice Notes Flow"]
    H1[Record Audio Clip] --> H2[Upload to API]
    H2 --> H3[Process & TTS]
    H3 --> H4[Stream Audio Back]
end

subgraph WS_Flow["WebSocket Real-time Flow"]
    W1[Open Connection] --> W2[Bidirectional Audio Stream]
    W2 --> W3[VAD & Interruption Support]
    W3 --> W2
end

HTTP --> HTTP_Flow
WS --> WS_Flow

style HTTP fill:#D4E2E7,stroke:#083241,color:#100F0F,stroke-width:2px
style WS fill:#F0DDD9,stroke:#AA412A,color:#100F0F,stroke-width:2px
style Start fill:#DDE3DB,stroke:#2c3827,color:#100F0F,stroke-width:2px" %}
```

## Voice Notes (HTTP)

Treat each `/interact` call as an asynchronous voice-note exchange, not a full-duplex call.

### Request Essentials

1. Encode microphone audio as `WAV` (PCM) or `FLAC`.
2. POST as `recorded_message` with `request_format=voice`.
3. Set `response_format=voice` and choose `Accept`:
   * `audio/mpeg` (MP3): efficient for mobile playback
   * `audio/wav` (PCM): simple decoding, good for short clips
4. Read the NDJSON stream. `new-message` events contain base64 audio chunks.

> TypeScript SDK note: voice over HTTP is not yet supported in the TS SDK. Use direct API calls.

### Sequence Diagram: Voice Note Exchange

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"actorBkg": "#083241", "actorTextColor": "#FFFFFF", "actorBorder": "#083241", "signalColor": "#575452", "signalTextColor": "#100F0F", "labelBoxBkgColor": "#F1EAE7", "labelBoxBorderColor": "#D7D2D0", "labelTextColor": "#100F0F", "loopTextColor": "#100F0F", "noteBkgColor": "#F1EAE7", "noteBorderColor": "#D7D2D0", "noteTextColor": "#100F0F", "activationBkgColor": "#E8E2EB", "activationBorderColor": "#083241", "altSectionBkgColor": "#F1EAE7", "altSectionColor": "#100F0F"}}}%%
sequenceDiagram
autonumber
participant C as Customer System
participant A as Amigo REST API

C->>A: POST /v1/{org}/conversation/<br/>{conversation\_id}/interact
Note over C,A: request\_format=voice<br/>response\_format=voice<br/>Accept: audio/wav | audio/mpeg<br/>Body: recorded\_message (audio clip)
A-->>C: 200 OK (NDJSON stream)
loop NDJSON events
A-->>C: new-message (base64 audio chunk)
end
A-->>C: interaction-complete { interaction\_id }" %}

### API Reference

{% openapi src="<https://api.amigo.ai/v1/openapi.json>" path="/v1/{organization}/conversation/{conversation\_id}/interact" method="post" %}
<https://api.amigo.ai/v1/openapi.json>
{% endopenapi %}

### Minimal Client Handling (browser-friendly)

```ts
if (evt.type === "new-message" && typeof evt.message === "string" && evt.message) {
  const bytes = Uint8Array.from(atob(evt.message), (c) => c.charCodeAt(0));
  playAudio(bytes.buffer); // your audio player implementation
}
```

### Tips

* Keep uploads short (a few seconds) for responsive turn-taking.
* Accumulate audio chunks from `new-message` into a single buffer for smooth playback.
* Use `interaction-complete` as the boundary between turns.

### Managing Perceived Latency

During voice interactions, the agent handles perceived latency automatically using **audio fillers** when operations take longer than expected.

#### How Audio Fillers Work

When an agent operation (decision-making, tool execution, or analysis) exceeds its configured timeout threshold, the system automatically:

1. Detects that the delay threshold has been exceeded (typically 2-10 seconds).
2. Selects a contextual audio filler phrase from the configured options.
3. Streams the pre-generated audio to maintain conversation flow.
4. Continues processing while the filler plays.

#### Example Flow

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"actorBkg": "#083241", "actorTextColor": "#FFFFFF", "actorBorder": "#083241", "signalColor": "#575452", "signalTextColor": "#100F0F", "labelBoxBkgColor": "#F1EAE7", "labelBoxBorderColor": "#D7D2D0", "labelTextColor": "#100F0F", "loopTextColor": "#100F0F", "noteBkgColor": "#F1EAE7", "noteBorderColor": "#D7D2D0", "noteTextColor": "#100F0F", "activationBkgColor": "#E8E2EB", "activationBorderColor": "#083241", "altSectionBkgColor": "#F1EAE7", "altSectionColor": "#100F0F"}}}%%
sequenceDiagram
autonumber
participant User
participant API
participant Agent

```
User->>API: Voice request (audio)
API->>Agent: Process request
Note over Agent: Tool execution begins
Note over Agent: 2 seconds pass...
Agent-->>API: ActionTooLongEvent
API-->>User: Audio filler: "Let me look that up..."
Note over Agent: Tool completes
Agent-->>API: Response ready
API-->>User: Agent response (audio)" %}
```

#### Common Filler Scenarios

| Scenario                         | Example Fillers                                                      |
| -------------------------------- | -------------------------------------------------------------------- |
| **Designated Tool (end-to-end)** | "I'm looking that up for you...", "Searching now\..."                |
| **Decision-Making**              | "Let me think about that...", "Just a moment..."                     |
| **Reflection**                   | "Let me consider this carefully...", "Analyzing that information..." |
| **Helper Tools**                 | "Checking that...", "One moment...", "Let me verify..."              |

#### Benefits

* **Reduces perceived wait time** by providing active feedback.
* **Keeps conversation natural** instead of awkward silence.
* **Improves user experience** with contextual acknowledgments.
* **Automatic and transparent**: no client-side changes needed.

#### Handling in Code

Audio fillers arrive as `current-agent-action` events with type `action-too-long`:

```typescript
if (evt.type === "current-agent-action" && evt.action.type === "action-too-long") {
  // Audio filler contains base64 PCM audio (or text fallback)
  const audioFiller = evt.action.filler;
  playAudio(base64ToBytes(audioFiller));
}
```

{% hint style="info" %}
**Configuration**\
Audio fillers are configured in your service's **Context Graph** (API field: `service_hierarchical_state_machine`). Each Context Graph state type and tool can have custom filler phrases and timeout thresholds. See [Conversations: Events](/developer-guide/classic-api/core-api/conversations/conversations-events.md#managing-perceived-latency-with-audio-fillers) for detailed configuration options.
{% endhint %}

{% hint style="warning" %}
**Best Practice: Keep `audio_filler_triggered_after` Close to Zero**

Set the delay to a very small value like `0.0001` (0.1ms). Any delay adds directly to perceived latency. Since most operations complete quickly, delays hurt the majority of interactions. The schema requires `> 0`, but values below 1ms are instantaneous in practice.
{% endhint %}

## Real-time Voice (WebSocket)

For low-latency, natural conversation with VAD and barge-in, use Real-time Voice (WebSocket). It supports continuous upstream audio, interruption handling, and streaming TTS.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations/conversations-voice.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
