podcastReal-time Voice (WebSocket)

Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.

circle-info

Real-time Capabilities This API enables sub-second latency voice conversations with automatic speech detection, interruption handling, and streaming responses.

Quick Start

// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]
);

// 2. Start a conversation when connected
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'client.start-conversation',
    service_id: 'your-service-id',
    service_version_set_name: 'release'
  }));
};

// 3. Handle incoming messages
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'server.conversation-created') {
    console.log('Ready to chat! Conversation ID:', message.conversation_id);
    // Now you can send audio or text messages
  }
  
  if (message.type === 'server.new-message' && message.message) {
    // Handle audio/text response from agent
    handleAgentResponse(message.message);
  }
};

// 4. Send a text message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));

What You Can Build

Use Case
Description

Voice Assistants

Natural voice conversations with automatic speech detection

Call Center Agents

Real-time customer support with interruption handling

Interactive Games

Voice-controlled gaming experiences

Healthcare Bots

Medical consultation assistants with voice interaction

Educational Tutors

Interactive learning with voice feedback

circle-check

Key Features

Feature
Description

Real-time Streaming

Send and receive audio chunks as they're generated

Voice Activity Detection

Automatic detection of speech start/stop

Low Latency

Sub-second response times with streaming

Interruption Handling

Natural conversation flow management

Audio Fillers

Automatic filler phrases during processing delays

Connection Setup

Endpoint

Regional Endpoints

Choose the endpoint closest to your users for best performance:

Region
Endpoint

US (default)

wss://api.amigo.ai/v1/{org}/conversation/converse_realtime

CA Central

wss://api-ca-central-1.amigo.ai/v1/{org}/conversation/converse_realtime

EU Central

wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime

AP Southeast

wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime

Query Parameters

Parameter
Type
Required
Description
Example

response_format

text | voice

Required

Agent response format

voice

audio_format

mp3 | pcm

If voice

Audio encoding: • pcm: Lower latency, VAD support • mp3: Bandwidth-efficient

pcm

current_agent_action_type

regex

Optional

Filter agent action events

^tool\..*

Authentication

WebSocket authentication uses the Sec-WebSocket-Protocol header with your bearer token:

circle-info

Token Format The token format is bearer.authorization.amigo.ai. + your JWT token. This is passed as a WebSocket subprotocol, not a header.

Conversation Flow

Step 1: Connect & Authenticate

Step 2: Initialize Conversation

Once connected, you must initialize the conversation:

Option A: Start New Conversation

Option B: Continue Existing Conversation

Step 3: Exchange Messages

Now you can send text or audio messages and receive responses:

Sequence Diagram

spinner

Message Reference

Messages You Send (Client → Server)

Send Text

Send Audio

Voice Activity Detection (VAD)

Finish Conversation

Standard Mode

VAD Mode

When in VAD mode, first disable VAD, wait for acknowledgment, then finish:

Graceful Close

Extend Timeout

Messages You Receive (Server → Client)

Conversation Lifecycle

Agent Responses

Voice Activity Detection (VAD) Mode

VAD mode enables hands-free, natural conversations with automatic speech detection.

circle-exclamation

How VAD Works

VAD with External Events

External events can interrupt ongoing conversations in VAD mode when marked with start_interaction: true:

External Event Behavior in VAD Mode:

  • When agent hasn't detected user speaking: External event with start_interaction: true interrupts any existing interaction and starts a new interaction immediately with the external event

  • When user is speaking (agent has detected): External event is queued - the agent waits until the user finishes speaking (indicated by server.vad-speech-ended), then triggers a new interaction with the external event

VAD Requirements

Important:

  • Audio format: Must use PCM (MP3 not supported in VAD mode)

  • Streaming: Continuous audio streaming required

  • Interruptions: Automatically handled by pausing agent audio

Audio Configuration

PCM Format (Best for real-time & VAD)

MP3 Format (Bandwidth-efficient)

Language Support

Supported Languages

The following languages are supported for both voice transcription and synthesis:

Language
Code

English

en

Spanish

es

French

fr

German

de

Italian

it

Portuguese

pt

Polish

pl

Turkish

tr

Russian

ru

Dutch

nl

Czech

cs

Arabic

ar

Chinese

zh

Japanese

ja

Hungarian

hu

Korean

ko

Hindi

hi

Language is determined by: 1) User's preferred_language setting, 2) Agent's default_spoken_language fallback

Error Handling

WebSocket Close Codes

Code
Error
Common Cause
Solution

3000

Unauthorized

Invalid/expired token

Refresh auth token

3003

Forbidden

Missing permissions

Check user permissions

3008

Timeout

No activity for 30s

Send extend-timeout every 15s

4000

Bad Request

Invalid message format

Check message structure

4004

Not Found

Service/conversation doesn't exist

Verify IDs

4009

Conflict

Conversation locked/finished

Check conversation state

4015

Unsupported Media

Wrong audio format

Use PCM for VAD, check config

4029

Rate Limited

Too many messages

Implement backoff, max 60/min

Error Handling Example

Performance & Limits

Rate Limits

Limit
Value
Notes

Messages/minute

60

Includes all message types

Connection timeout

30 seconds

Reset by any message

Keep-alive interval

15 seconds

Send extend-timeout

Concurrent connections

1 per user/service

One active connection at a time

Audio chunk size

20-60ms

Optimal for real-time streaming

Max message size

1MB

For audio chunks

Keep Connection Alive

Complete Implementation

Production-Ready WebSocket Client

Common Patterns & Troubleshooting

Connection Flow Diagram

Common Issues & Solutions

Issue
Symptom
Solution

No audio playback

Audio received but silent

Check audio format matches audio_format param

Connection drops

Disconnects after 30s

Implement keep-alive with extend-timeout

VAD not working

Speech not detected

Ensure using PCM format, not MP3

Authentication fails

Code 3000 on connect

Check token format: bearer.authorization.amigo.ai.{token}

Conversation locked

Code 4009

Only 1 connection per user/service allowed

Empty transcripts

VAD returns empty text

Check microphone permissions and audio levels

Choppy audio

Broken playback

Buffer audio chunks before playing

High latency

Slow responses

Use regional endpoints, PCM format

Best Practices

  1. Connection Management

    • Implement reconnection logic for network interruptions

    • Send periodic extend-timeout messages during long idle periods

    • Properly close connections with client.close-connection

  2. Audio Streaming

    • Use PCM format for lowest latency in VAD mode

    • Stream audio chunks as they become available (don't buffer entire message)

    • Include audio_config only in the first chunk

  3. Error Recovery

    • Handle WebSocket close events gracefully

    • Implement exponential backoff for reconnections

    • Save conversation ID to continue after disconnection

  4. Performance

    • Reuse WebSocket connections when possible

    • Process audio chunks immediately upon receipt

    • Use appropriate audio buffer sizes (typically 20-60ms chunks)

  5. Security

    • Never expose authentication tokens in client-side code

    • Use secure WebSocket connections (wss://)

    • Implement token refresh before expiration

SDK & Framework Support

Current Support

Platform
Status
Notes

JavaScript/Browser

Full support

Native WebSocket API

Node.js

Full support

Use ws package

TypeScript SDK

Coming soon

Use WebSocket API directly

Python

Supported

Use websockets library

React Native

Supported

Built-in WebSocket support

Flutter

Supported

Use web_socket_channel

Framework Examples

Node.js

Python

Last updated

Was this helpful?