Real-time Voice (WebSocket)

Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.

Real-time Capabilities This API enables sub-second latency voice conversations with automatic speech detection, interruption handling, and streaming responses.

Quick Start

// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]
);

// 2. Start a conversation when connected
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'client.start-conversation',
    service_id: 'your-service-id',
    service_version_set_name: 'release'
  }));
};

// 3. Handle incoming messages
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'server.conversation-created') {
    console.log('Ready to chat! Conversation ID:', message.conversation_id);
    // Now you can send audio or text messages
  }
  
  if (message.type === 'server.new-message' && message.message) {
    // Handle audio/text response from agent
    handleAgentResponse(message.message);
  }
};

// 4. Send a text message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));

What You Can Build

Use Case
Description

Voice Assistants

Natural voice conversations with automatic speech detection

Call Center Agents

Real-time customer support with interruption handling

Interactive Games

Voice-controlled gaming experiences

Healthcare Bots

Medical consultation assistants with voice interaction

Educational Tutors

Interactive learning with voice feedback

Key Features

Feature
Description

Real-time Streaming

Send and receive audio chunks as they're generated

Voice Activity Detection

Automatic detection of speech start/stop

Low Latency

Sub-second response times with streaming

Interruption Handling

Natural conversation flow management

Connection Setup

Endpoint

wss://api.amigo.ai/v1/{organization}/conversation/converse_realtime

Regional Endpoints

Choose the endpoint closest to your users for best performance:

Region
Endpoint

US (default)

wss://api.amigo.ai/v1/{org}/conversation/converse_realtime

EU Central

wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime

AP Southeast

wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime

Query Parameters

Parameter
Type
Required
Description
Example

response_format

text | voice

Required

Agent response format

voice

audio_format

mp3 | pcm

If voice

Audio encoding: • pcm: Lower latency, VAD support • mp3: Bandwidth-efficient

pcm

current_agent_action_type

regex

Optional

Filter agent action events

^tool\..*

Authentication

WebSocket authentication uses the Sec-WebSocket-Protocol header with your bearer token:

// Get your auth token (from login or API key)
const authToken = await getAuthToken();

// Pass token as WebSocket subprotocol
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]  // ← Token goes here
);

Token Format The token format is bearer.authorization.amigo.ai. + your JWT token. This is passed as a WebSocket subprotocol, not a header.

Conversation Flow

Step 1: Connect & Authenticate

const ws = new WebSocket(
  `wss://api.amigo.ai/v1/${orgId}/conversation/converse_realtime?response_format=voice&audio_format=pcm`,
  [`bearer.authorization.amigo.ai.${authToken}`]
);

// Handle connection events
ws.onopen = () => console.log('Connected');
ws.onerror = (error) => console.error('Connection error:', error);
ws.onclose = (event) => console.log('Disconnected:', event.code, event.reason);

Step 2: Initialize Conversation

Once connected, you must initialize the conversation:

Option A: Start New Conversation

ws.send(JSON.stringify({
  type: 'client.start-conversation',
  service_id: 'your-service-id',        // Your agent service ID
  service_version_set_name: 'release'   // 'release', 'edge', or custom
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-created', conversation_id: '...' }

Option B: Continue Existing Conversation

ws.send(JSON.stringify({
  type: 'client.continue-conversation',
  conversation_id: 'existing-conversation-id'
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-retrieved' }

Step 3: Exchange Messages

Now you can send text or audio messages and receive responses:

Sequence Diagram

Message Reference

Messages You Send (Client → Server)

Send Text

// User message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));

// System event (e.g., user actions, context)
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'User navigated to checkout page',
  message_type: 'external-event'
}));

Send Audio

// First chunk - include config
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: base64AudioChunk,  // Base64 encoded PCM audio
  audio_config: {
    format: 'pcm',
    sample_rate: 16000,     // 16kHz
    sample_width: 2,        // 16-bit
    n_channels: 1,          // Mono
    frame_rate: 16000
  }
}));

// Subsequent chunks - no config needed
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: nextBase64AudioChunk
}));

// Signal end of audio
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: null
}));

Voice Activity Detection (VAD)

// Enable automatic speech detection
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// In VAD mode: continuously stream audio, server detects speech
// Server will automatically determine when user starts/stops speaking

Finish Conversation

Standard Mode

{
  "type": "client.finish-conversation"
}

VAD Mode

When in VAD mode, first disable VAD, wait for acknowledgment, then finish:

// Step 1: Disable VAD mode
{
  "type": "client.switch-vad-mode",
  "vad_mode_on": false
}

// Wait for response (may take up to 10 seconds)
{
  "type": "server.vad-mode-switched",
  "current_vad_mode_on": false
}

// Step 2: Finish conversation
{
  "type": "client.finish-conversation"
}

// Response
{
  "type": "server.conversation-completed"
}

Graceful Close

{
  "type": "client.close-connection"
}

Extend Timeout

{
  "type": "client.extend-timeout"
}

Messages You Receive (Server → Client)

Conversation Lifecycle

// Conversation created
{
  type: 'server.conversation-created',
  conversation_id: '507f1f77bcf86cd799439012'
}

// Conversation retrieved (when continuing)
{
  type: 'server.conversation-retrieved'
}

// Conversation finished
{
  type: 'server.conversation-completed'
}

Agent Responses

// Text response chunk
{
  type: 'server.new-message',
  interaction_id: '...', 
  message: 'Hello! I can help you with...',  // Text chunk
  message_metadata: [],
  transcript_alignment: null,
  stop: false,              // false = more chunks coming
  sequence_number: 1,
  message_id: '...'
}

// Audio response chunk
{
  type: 'server.new-message',
  interaction_id: '...',
  message: 'base64_audio_chunk',             // Base64 PCM audio
  message_metadata: [],
  transcript_alignment: [                    // Timing for each character (ms)
    [0, 'H'], [100, 'e'], [200, 'l'], [300, 'l'], [400, 'o']
  ],
  stop: false,
  sequence_number: 1,
  message_id: '...'
}

// Response complete
{
  type: 'server.interaction-complete',
  message_id: '...',
  interaction_id: '...',
  full_message: 'Complete text or transcript',
  conversation_completed: false    // true = agent ended conversation
}

#### Voice Activity Detection Events
```javascript
// User started speaking
{
  type: 'server.vad-speech-started',
  start: 1.234  // Seconds from last reset
}

// User stopped speaking (with transcript)
{
  type: 'server.vad-speech-ended',
  transcript: 'What the user said',
  start: 1.234,  // When speech started (seconds)
  end: 3.456     // When speech ended (seconds)
}

// Time reference reset
{
  type: 'server.vad-speech-reset-zero',
  timestamp: 0.0  // New zero point for timing
}

// VAD mode changed
{
  type: 'server.vad-mode-switched',
  current_vad_mode_on: true  // Current VAD state
}

Voice Activity Detection (VAD) Mode

VAD mode enables hands-free, natural conversations with automatic speech detection.

{% hint style="warning" %} VAD Requirements

  • Audio format must be PCM (MP3 not supported)

  • Continuous audio streaming required

  • Interruptions automatically handled by pausing agent audio {% endhint %}

How VAD Works

// 1. Enable VAD mode
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// 2. Stream audio continuously (server detects speech)
const streamAudio = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  // ... convert to PCM and send chunks continuously
};

// 3. Handle VAD events
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  
  switch(msg.type) {
    case 'server.vad-speech-started':
      console.log('User speaking...');
      pauseAgentAudio();  // Stop agent playback if speaking
      break;
      
    case 'server.vad-speech-ended':
      console.log('User said:', msg.transcript);
      // Agent automatically responds
      break;
  }
};

VAD Requirements

Important:

  • Audio format: Must use PCM (MP3 not supported in VAD mode)

  • Streaming: Continuous audio streaming required

  • Interruptions: Automatically handled by pausing agent audio

Audio Configuration

PCM Format (Best for real-time & VAD)

const pcmConfig = {
  format: 'pcm',
  sample_rate: 16000,   // 16 kHz
  sample_width: 2,      // 16-bit
  n_channels: 1,        // Mono
  frame_rate: 16000
};

// Example: Convert Web Audio API to PCM
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

processor.onaudioprocess = (e) => {
  const pcmData = convertToPCM16(e.inputBuffer.getChannelData(0));
  ws.send(JSON.stringify({
    type: 'client.new-audio-message',
    audio: btoa(String.fromCharCode(...pcmData))
  }));
};

MP3 Format (Bandwidth-efficient)

const mp3Config = {
  format: 'mp3',
  bit_rate: 128000,    // 128 kbps
  sample_rate: 44100,  // 44.1 kHz  
  n_channels: 2        // Stereo
};

Language Support

Supported Languages

The following languages are supported for both voice transcription and synthesis:

Language
Code

English

en

Spanish

es

French

fr

German

de

Italian

it

Portuguese

pt

Polish

pl

Turkish

tr

Russian

ru

Dutch

nl

Czech

cs

Arabic

ar

Chinese

zh

Japanese

ja

Hungarian

hu

Korean

ko

Hindi

hi

Language is determined by: 1) User's preferred_language setting, 2) Agent's default_spoken_language fallback

Error Handling

WebSocket Close Codes

Code
Error
Common Cause
Solution

3000

Unauthorized

Invalid/expired token

Refresh auth token

3003

Forbidden

Missing permissions

Check user permissions

3008

Timeout

No activity for 30s

Send extend-timeout every 15s

4000

Bad Request

Invalid message format

Check message structure

4004

Not Found

Service/conversation doesn't exist

Verify IDs

4009

Conflict

Conversation locked/finished

Check conversation state

4015

Unsupported Media

Wrong audio format

Use PCM for VAD, check config

4029

Rate Limited

Too many messages

Implement backoff, max 60/min

Error Handling Example

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  switch(event.code) {
    case 3000:
      // Refresh token and reconnect
      await refreshAuthToken();
      reconnect();
      break;
    case 3008:
      // Connection timed out
      console.log('Connection timed out - forgot to send keep-alive?');
      break;
    case 4029:
      // Rate limited - implement exponential backoff
      setTimeout(() => reconnect(), backoffDelay);
      break;
    default:
      console.error(`Connection closed: ${event.code} - ${event.reason}`);
  }
};

Performance & Limits

Rate Limits

Limit
Value
Notes

Messages/minute

60

Includes all message types

Connection timeout

30 seconds

Reset by any message

Keep-alive interval

15 seconds

Send extend-timeout

Concurrent connections

1 per user/service

One active connection at a time

Audio chunk size

20-60ms

Optimal for real-time streaming

Max message size

1MB

For audio chunks

Keep Connection Alive

// Send keep-alive every 15 seconds
const keepAlive = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
  }
}, 15000);

// Clean up on close
ws.onclose = () => clearInterval(keepAlive);

Complete Implementation

Production-Ready WebSocket Client

class RealtimeConversation {
  constructor(orgId, authToken, options = {}) {
    this.orgId = orgId;
    this.authToken = authToken;
    this.ws = null;
    this.keepAliveInterval = null;
    this.audioQueue = [];
    this.isPlaying = false;
    
    // Configuration
    this.options = {
      responseFormat: options.responseFormat || 'voice',
      audioFormat: options.audioFormat || 'pcm',
      vadEnabled: options.vadEnabled || false,
      onMessage: options.onMessage || (() => {}),
      onError: options.onError || console.error,
      onClose: options.onClose || (() => {})
    };
  }

  async connect(serviceId) {
    const url = `wss://api.amigo.ai/v1/${this.orgId}/conversation/converse_realtime` +
                `?response_format=${this.options.responseFormat}` +
                `&audio_format=${this.options.audioFormat}`;
    
    this.ws = new WebSocket(url, [`bearer.authorization.amigo.ai.${this.authToken}`]);
    
    return new Promise((resolve, reject) => {
      this.ws.onopen = () => {
        console.log('WebSocket connected');
        
        // Start keep-alive
        this.startKeepAlive();
        
        // Initialize conversation
        this.ws.send(JSON.stringify({
          type: 'client.start-conversation',
          service_id: serviceId,
          service_version_set_name: 'release'
        }));
      };
      
      this.ws.onmessage = (event) => {
        const message = JSON.parse(event.data);
        this.handleMessage(message);
        
        if (message.type === 'server.conversation-created') {
          resolve(message.conversation_id);
          
          // Enable VAD if requested
          if (this.options.vadEnabled) {
            this.enableVAD();
          }
        }
      };
      
      this.ws.onerror = (error) => {
        this.options.onError(error);
        reject(error);
      };
      
      this.ws.onclose = (event) => {
        this.cleanup();
        this.options.onClose(event);
      };
    });
  }
  
  handleMessage(message) {
    this.options.onMessage(message);
    
    switch(message.type) {
      case 'server.conversation-created':
        console.log('Conversation started:', message.conversation_id);
        break;
        
      case 'server.new-message':
        if (this.options.responseFormat === 'voice' && message.message) {
          this.queueAudio(message.message);
        }
        break;
        
      case 'server.interaction-complete':
        console.log('Response complete');
        break;
        
      case 'server.vad-speech-started':
        console.log('User speaking...');
        this.pauseAudio();
        break;
        
      case 'server.vad-speech-ended':
        console.log('User said:', message.transcript);
        break;
    }
  }
  
  // Keep connection alive
  startKeepAlive() {
    this.keepAliveInterval = setInterval(() => {
      if (this.ws?.readyState === WebSocket.OPEN) {
        this.ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
      }
    }, 15000);
  }
  
  // Voice Activity Detection
  async enableVAD() {
    this.ws.send(JSON.stringify({
      type: 'client.switch-vad-mode',
      vad_mode_on: true
    }));
    
    // Start streaming microphone audio
    await this.startAudioStream();
  }
  
  async startAudioStream() {
    const stream = await navigator.mediaDevices.getUserMedia({ 
      audio: {
        channelCount: 1,
        sampleRate: 16000,
        echoCancellation: true,
        noiseSuppression: true
      }
    });
    
    const audioContext = new AudioContext({ sampleRate: 16000 });
    const source = audioContext.createMediaStreamSource(stream);
    const processor = audioContext.createScriptProcessor(4096, 1, 1);
    
    let isFirstChunk = true;
    processor.onaudioprocess = (e) => {
      const pcmData = this.convertToPCM16(e.inputBuffer.getChannelData(0));
      this.sendAudio(pcmData, isFirstChunk);
      isFirstChunk = false;
    };
    
    source.connect(processor);
    processor.connect(audioContext.destination);
  }
  
  // Send messages
  sendText(text, messageType = 'user-message') {
    if (this.ws?.readyState !== WebSocket.OPEN) {
      throw new Error('WebSocket not connected');
    }
    
    this.ws.send(JSON.stringify({
      type: 'client.new-text-message',
      text: text,
      message_type: messageType
    }));
  }
  
  sendAudio(audioData, isFirstChunk = false) {
    const message = {
      type: 'client.new-audio-message',
      audio: this.arrayBufferToBase64(audioData)
    };
    
    if (isFirstChunk) {
      message.audio_config = {
        format: 'pcm',
        sample_rate: 16000,
        sample_width: 2,
        n_channels: 1,
        frame_rate: 16000
      };
    }
    
    this.ws.send(JSON.stringify(message));
  }
  
  completeAudio() {
    this.ws.send(JSON.stringify({
      type: 'client.new-audio-message',
      audio: null
    }));
  }
  
  // Audio utilities
  convertToPCM16(float32Array) {
    const int16Array = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]));
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return int16Array.buffer;
  }
  
  arrayBufferToBase64(buffer) {
    const bytes = new Uint8Array(buffer);
    let binary = '';
    bytes.forEach(b => binary += String.fromCharCode(b));
    return btoa(binary);
  }
  
  base64ToArrayBuffer(base64) {
    const binaryString = atob(base64);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes.buffer;
  }
  
  // Audio playback
  queueAudio(base64Audio) {
    const audioBuffer = this.base64ToArrayBuffer(base64Audio);
    this.audioQueue.push(audioBuffer);
    
    if (!this.isPlaying) {
      this.playNextAudio();
    }
  }
  
  async playNextAudio() {
    if (this.audioQueue.length === 0) {
      this.isPlaying = false;
      return;
    }
    
    this.isPlaying = true;
    const audioBuffer = this.audioQueue.shift();
    
    // Play using Web Audio API
    const audioContext = new AudioContext();
    const source = audioContext.createBufferSource();
    
    // Decode PCM data
    const audioData = await audioContext.decodeAudioData(audioBuffer);
    source.buffer = audioData;
    source.connect(audioContext.destination);
    source.onended = () => this.playNextAudio();
    source.start();
  }
  
  pauseAudio() {
    // Clear audio queue when interrupted
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  // Cleanup
  async finish() {
    if (this.options.vadEnabled) {
      // Disable VAD first
      this.ws.send(JSON.stringify({
        type: 'client.switch-vad-mode',
        vad_mode_on: false
      }));
      
      // Wait for confirmation
      await new Promise(resolve => {
        const handler = (event) => {
          const msg = JSON.parse(event.data);
          if (msg.type === 'server.vad-mode-switched') {
            this.ws.removeEventListener('message', handler);
            resolve();
          }
        };
        this.ws.addEventListener('message', handler);
      });
    }
    
    // Now finish conversation
    this.ws.send(JSON.stringify({
      type: 'client.finish-conversation'
    }));
  }
  
  cleanup() {
    if (this.keepAliveInterval) {
      clearInterval(this.keepAliveInterval);
    }
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  close() {
    this.cleanup();
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'client.close-connection' }));
      this.ws.close();
    }
  }
}

// Usage Example
async function main() {
  const client = new RealtimeConversation('your-org', 'your-auth-token', {
    responseFormat: 'voice',
    audioFormat: 'pcm',
    vadEnabled: true,
    onMessage: (msg) => {
      // Handle all messages
      console.log('Message:', msg.type);
    },
    onError: (error) => {
      console.error('Error:', error);
    },
    onClose: (event) => {
      console.log('Closed:', event.code, event.reason);
    }
  });
  
  try {
    // Connect and start conversation
    const conversationId = await client.connect('service-id');
    console.log('Conversation ID:', conversationId);
    
    // Send a text message
    client.sendText('Hello, how can you help me?');
    
    // Or manually send audio (if not using VAD)
    // client.sendAudio(pcmAudioData, true);
    // client.completeAudio();
    
    // When done
    // await client.finish();
    
  } catch (error) {
    console.error('Failed to connect:', error);
  }
}

main();

Common Patterns & Troubleshooting

Connection Flow Diagram

1. Connect WebSocket

2. Authenticate (via subprotocol)

3. Initialize Conversation
   ├── New: client.start-conversation
   └── Existing: client.continue-conversation

4. Receive Confirmation
   ├── server.conversation-created
   └── server.conversation-retrieved

5. Exchange Messages
   ├── Text: client.new-text-message
   ├── Audio: client.new-audio-message
   └── VAD: Continuous streaming

6. Receive Responses
   ├── server.new-message (chunks)
   └── server.interaction-complete

7. Finish/Close
   ├── client.finish-conversation
   └── client.close-connection

Common Issues & Solutions

Issue
Symptom
Solution

No audio playback

Audio received but silent

Check audio format matches audio_format param

Connection drops

Disconnects after 30s

Implement keep-alive with extend-timeout

VAD not working

Speech not detected

Ensure using PCM format, not MP3

Authentication fails

Code 3000 on connect

Check token format: bearer.authorization.amigo.ai.{token}

Conversation locked

Code 4009

Only 1 connection per user/service allowed

Empty transcripts

VAD returns empty text

Check microphone permissions and audio levels

Choppy audio

Broken playback

Buffer audio chunks before playing

High latency

Slow responses

Use regional endpoints, PCM format

Best Practices

  1. Connection Management

    • Implement reconnection logic for network interruptions

    • Send periodic extend-timeout messages during long idle periods

    • Properly close connections with client.close-connection

  2. Audio Streaming

    • Use PCM format for lowest latency in VAD mode

    • Stream audio chunks as they become available (don't buffer entire message)

    • Include audio_config only in the first chunk

  3. Error Recovery

    • Handle WebSocket close events gracefully

    • Implement exponential backoff for reconnections

    • Save conversation ID to continue after disconnection

  4. Performance

    • Reuse WebSocket connections when possible

    • Process audio chunks immediately upon receipt

    • Use appropriate audio buffer sizes (typically 20-60ms chunks)

  5. Security

    • Never expose authentication tokens in client-side code

    • Use secure WebSocket connections (wss://)

    • Implement token refresh before expiration

SDK & Framework Support

Current Support

Platform
Status
Notes

JavaScript/Browser

Full support

Native WebSocket API

Node.js

Full support

Use ws package

TypeScript SDK

Coming soon

Use WebSocket API directly

Python

Supported

Use websockets library

React Native

Supported

Built-in WebSocket support

Flutter

Supported

Use web_socket_channel

Framework Examples

Node.js

const WebSocket = require('ws');
const ws = new WebSocket(url, {
  headers: {
    'Sec-WebSocket-Protocol': `bearer.authorization.amigo.ai.${token}`
  }
});

Python

import websockets
import json

async def connect():
    url = f"wss://api.amigo.ai/v1/{org}/conversation/converse_realtime"
    headers = {
        "Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{token}"
    }
    async with websockets.connect(url, subprotocols=[headers["Sec-WebSocket-Protocol"]]) as ws:
        await ws.send(json.dumps({
            "type": "client.start-conversation",
            "service_id": service_id,
            "service_version_set_name": "release"
        }))

Last updated

Was this helpful?