Real-time Voice (WebSocket)
Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.
Quick Start
// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
['bearer.authorization.amigo.ai.' + authToken]
);
// 2. Start a conversation when connected
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'client.start-conversation',
service_id: 'your-service-id',
service_version_set_name: 'release'
}));
};
// 3. Handle incoming messages
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'server.conversation-created') {
console.log('Ready to chat! Conversation ID:', message.conversation_id);
// Now you can send audio or text messages
}
if (message.type === 'server.new-message' && message.message) {
// Handle audio/text response from agent
handleAgentResponse(message.message);
}
};
// 4. Send a text message
ws.send(JSON.stringify({
type: 'client.new-text-message',
text: 'Hello, how can you help me?',
message_type: 'user-message'
}));
What You Can Build
Voice Assistants
Natural voice conversations with automatic speech detection
Call Center Agents
Real-time customer support with interruption handling
Interactive Games
Voice-controlled gaming experiences
Healthcare Bots
Medical consultation assistants with voice interaction
Educational Tutors
Interactive learning with voice feedback
Key Features
Real-time Streaming
Send and receive audio chunks as they're generated
Voice Activity Detection
Automatic detection of speech start/stop
Low Latency
Sub-second response times with streaming
Interruption Handling
Natural conversation flow management
Connection Setup
Endpoint
wss://api.amigo.ai/v1/{organization}/conversation/converse_realtime
Regional Endpoints
Choose the endpoint closest to your users for best performance:
US (default)
wss://api.amigo.ai/v1/{org}/conversation/converse_realtime
EU Central
wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime
AP Southeast
wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime
Query Parameters
response_format
text
| voice
Required
Agent response format
voice
audio_format
mp3
| pcm
If voice
Audio encoding:
• pcm
: Lower latency, VAD support
• mp3
: Bandwidth-efficient
pcm
current_agent_action_type
regex
Optional
Filter agent action events
^tool\..*
Authentication
WebSocket authentication uses the Sec-WebSocket-Protocol
header with your bearer token:
// Get your auth token (from login or API key)
const authToken = await getAuthToken();
// Pass token as WebSocket subprotocol
const ws = new WebSocket(
'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
['bearer.authorization.amigo.ai.' + authToken] // ← Token goes here
);
Conversation Flow
Step 1: Connect & Authenticate
const ws = new WebSocket(
`wss://api.amigo.ai/v1/${orgId}/conversation/converse_realtime?response_format=voice&audio_format=pcm`,
[`bearer.authorization.amigo.ai.${authToken}`]
);
// Handle connection events
ws.onopen = () => console.log('Connected');
ws.onerror = (error) => console.error('Connection error:', error);
ws.onclose = (event) => console.log('Disconnected:', event.code, event.reason);
Step 2: Initialize Conversation
Once connected, you must initialize the conversation:
Option A: Start New Conversation
ws.send(JSON.stringify({
type: 'client.start-conversation',
service_id: 'your-service-id', // Your agent service ID
service_version_set_name: 'release' // 'release', 'edge', or custom
}));
// Wait for confirmation
// → Receive: { type: 'server.conversation-created', conversation_id: '...' }
Option B: Continue Existing Conversation
ws.send(JSON.stringify({
type: 'client.continue-conversation',
conversation_id: 'existing-conversation-id'
}));
// Wait for confirmation
// → Receive: { type: 'server.conversation-retrieved' }
Step 3: Exchange Messages
Now you can send text or audio messages and receive responses:
Sequence Diagram
Message Reference
Messages You Send (Client → Server)
Send Text
// User message
ws.send(JSON.stringify({
type: 'client.new-text-message',
text: 'Hello, how can you help me?',
message_type: 'user-message'
}));
// System event (e.g., user actions, context)
ws.send(JSON.stringify({
type: 'client.new-text-message',
text: 'User navigated to checkout page',
message_type: 'external-event'
}));
Send Audio
// First chunk - include config
ws.send(JSON.stringify({
type: 'client.new-audio-message',
audio: base64AudioChunk, // Base64 encoded PCM audio
audio_config: {
format: 'pcm',
sample_rate: 16000, // 16kHz
sample_width: 2, // 16-bit
n_channels: 1, // Mono
frame_rate: 16000
}
}));
// Subsequent chunks - no config needed
ws.send(JSON.stringify({
type: 'client.new-audio-message',
audio: nextBase64AudioChunk
}));
// Signal end of audio
ws.send(JSON.stringify({
type: 'client.new-audio-message',
audio: null
}));
Voice Activity Detection (VAD)
// Enable automatic speech detection
ws.send(JSON.stringify({
type: 'client.switch-vad-mode',
vad_mode_on: true
}));
// In VAD mode: continuously stream audio, server detects speech
// Server will automatically determine when user starts/stops speaking
Finish Conversation
Standard Mode
{
"type": "client.finish-conversation"
}
VAD Mode
When in VAD mode, first disable VAD, wait for acknowledgment, then finish:
// Step 1: Disable VAD mode
{
"type": "client.switch-vad-mode",
"vad_mode_on": false
}
// Wait for response (may take up to 10 seconds)
{
"type": "server.vad-mode-switched",
"current_vad_mode_on": false
}
// Step 2: Finish conversation
{
"type": "client.finish-conversation"
}
// Response
{
"type": "server.conversation-completed"
}
Graceful Close
{
"type": "client.close-connection"
}
Extend Timeout
{
"type": "client.extend-timeout"
}
Messages You Receive (Server → Client)
Conversation Lifecycle
// Conversation created
{
type: 'server.conversation-created',
conversation_id: '507f1f77bcf86cd799439012'
}
// Conversation retrieved (when continuing)
{
type: 'server.conversation-retrieved'
}
// Conversation finished
{
type: 'server.conversation-completed'
}
Agent Responses
// Text response chunk
{
type: 'server.new-message',
interaction_id: '...',
message: 'Hello! I can help you with...', // Text chunk
message_metadata: [],
transcript_alignment: null,
stop: false, // false = more chunks coming
sequence_number: 1,
message_id: '...'
}
// Audio response chunk
{
type: 'server.new-message',
interaction_id: '...',
message: 'base64_audio_chunk', // Base64 PCM audio
message_metadata: [],
transcript_alignment: [ // Timing for each character (ms)
[0, 'H'], [100, 'e'], [200, 'l'], [300, 'l'], [400, 'o']
],
stop: false,
sequence_number: 1,
message_id: '...'
}
// Response complete
{
type: 'server.interaction-complete',
message_id: '...',
interaction_id: '...',
full_message: 'Complete text or transcript',
conversation_completed: false // true = agent ended conversation
}
#### Voice Activity Detection Events
```javascript
// User started speaking
{
type: 'server.vad-speech-started',
start: 1.234 // Seconds from last reset
}
// User stopped speaking (with transcript)
{
type: 'server.vad-speech-ended',
transcript: 'What the user said',
start: 1.234, // When speech started (seconds)
end: 3.456 // When speech ended (seconds)
}
// Time reference reset
{
type: 'server.vad-speech-reset-zero',
timestamp: 0.0 // New zero point for timing
}
// VAD mode changed
{
type: 'server.vad-mode-switched',
current_vad_mode_on: true // Current VAD state
}
Voice Activity Detection (VAD) Mode
VAD mode enables hands-free, natural conversations with automatic speech detection.
{% hint style="warning" %} VAD Requirements
Audio format must be PCM (MP3 not supported)
Continuous audio streaming required
Interruptions automatically handled by pausing agent audio {% endhint %}
How VAD Works
// 1. Enable VAD mode
ws.send(JSON.stringify({
type: 'client.switch-vad-mode',
vad_mode_on: true
}));
// 2. Stream audio continuously (server detects speech)
const streamAudio = async () => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
// ... convert to PCM and send chunks continuously
};
// 3. Handle VAD events
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
switch(msg.type) {
case 'server.vad-speech-started':
console.log('User speaking...');
pauseAgentAudio(); // Stop agent playback if speaking
break;
case 'server.vad-speech-ended':
console.log('User said:', msg.transcript);
// Agent automatically responds
break;
}
};
VAD Requirements
Important:
Audio format: Must use PCM (MP3 not supported in VAD mode)
Streaming: Continuous audio streaming required
Interruptions: Automatically handled by pausing agent audio
Audio Configuration
PCM Format (Best for real-time & VAD)
const pcmConfig = {
format: 'pcm',
sample_rate: 16000, // 16 kHz
sample_width: 2, // 16-bit
n_channels: 1, // Mono
frame_rate: 16000
};
// Example: Convert Web Audio API to PCM
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const pcmData = convertToPCM16(e.inputBuffer.getChannelData(0));
ws.send(JSON.stringify({
type: 'client.new-audio-message',
audio: btoa(String.fromCharCode(...pcmData))
}));
};
MP3 Format (Bandwidth-efficient)
const mp3Config = {
format: 'mp3',
bit_rate: 128000, // 128 kbps
sample_rate: 44100, // 44.1 kHz
n_channels: 2 // Stereo
};
Language Support
Supported Languages
The following languages are supported for both voice transcription and synthesis:
English
en
Spanish
es
French
fr
German
de
Italian
it
Portuguese
pt
Polish
pl
Turkish
tr
Russian
ru
Dutch
nl
Czech
cs
Arabic
ar
Chinese
zh
Japanese
ja
Hungarian
hu
Korean
ko
Hindi
hi
Language is determined by: 1) User's
preferred_language
setting, 2) Agent'sdefault_spoken_language
fallback
Error Handling
WebSocket Close Codes
3000
Unauthorized
Invalid/expired token
Refresh auth token
3003
Forbidden
Missing permissions
Check user permissions
3008
Timeout
No activity for 30s
Send extend-timeout
every 15s
4000
Bad Request
Invalid message format
Check message structure
4004
Not Found
Service/conversation doesn't exist
Verify IDs
4009
Conflict
Conversation locked/finished
Check conversation state
4015
Unsupported Media
Wrong audio format
Use PCM for VAD, check config
4029
Rate Limited
Too many messages
Implement backoff, max 60/min
Error Handling Example
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = (event) => {
switch(event.code) {
case 3000:
// Refresh token and reconnect
await refreshAuthToken();
reconnect();
break;
case 3008:
// Connection timed out
console.log('Connection timed out - forgot to send keep-alive?');
break;
case 4029:
// Rate limited - implement exponential backoff
setTimeout(() => reconnect(), backoffDelay);
break;
default:
console.error(`Connection closed: ${event.code} - ${event.reason}`);
}
};
Performance & Limits
Rate Limits
Messages/minute
60
Includes all message types
Connection timeout
30 seconds
Reset by any message
Keep-alive interval
15 seconds
Send extend-timeout
Concurrent connections
1 per user/service
One active connection at a time
Audio chunk size
20-60ms
Optimal for real-time streaming
Max message size
1MB
For audio chunks
Keep Connection Alive
// Send keep-alive every 15 seconds
const keepAlive = setInterval(() => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
}
}, 15000);
// Clean up on close
ws.onclose = () => clearInterval(keepAlive);
Complete Implementation
Production-Ready WebSocket Client
class RealtimeConversation {
constructor(orgId, authToken, options = {}) {
this.orgId = orgId;
this.authToken = authToken;
this.ws = null;
this.keepAliveInterval = null;
this.audioQueue = [];
this.isPlaying = false;
// Configuration
this.options = {
responseFormat: options.responseFormat || 'voice',
audioFormat: options.audioFormat || 'pcm',
vadEnabled: options.vadEnabled || false,
onMessage: options.onMessage || (() => {}),
onError: options.onError || console.error,
onClose: options.onClose || (() => {})
};
}
async connect(serviceId) {
const url = `wss://api.amigo.ai/v1/${this.orgId}/conversation/converse_realtime` +
`?response_format=${this.options.responseFormat}` +
`&audio_format=${this.options.audioFormat}`;
this.ws = new WebSocket(url, [`bearer.authorization.amigo.ai.${this.authToken}`]);
return new Promise((resolve, reject) => {
this.ws.onopen = () => {
console.log('WebSocket connected');
// Start keep-alive
this.startKeepAlive();
// Initialize conversation
this.ws.send(JSON.stringify({
type: 'client.start-conversation',
service_id: serviceId,
service_version_set_name: 'release'
}));
};
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
this.handleMessage(message);
if (message.type === 'server.conversation-created') {
resolve(message.conversation_id);
// Enable VAD if requested
if (this.options.vadEnabled) {
this.enableVAD();
}
}
};
this.ws.onerror = (error) => {
this.options.onError(error);
reject(error);
};
this.ws.onclose = (event) => {
this.cleanup();
this.options.onClose(event);
};
});
}
handleMessage(message) {
this.options.onMessage(message);
switch(message.type) {
case 'server.conversation-created':
console.log('Conversation started:', message.conversation_id);
break;
case 'server.new-message':
if (this.options.responseFormat === 'voice' && message.message) {
this.queueAudio(message.message);
}
break;
case 'server.interaction-complete':
console.log('Response complete');
break;
case 'server.vad-speech-started':
console.log('User speaking...');
this.pauseAudio();
break;
case 'server.vad-speech-ended':
console.log('User said:', message.transcript);
break;
}
}
// Keep connection alive
startKeepAlive() {
this.keepAliveInterval = setInterval(() => {
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
}
}, 15000);
}
// Voice Activity Detection
async enableVAD() {
this.ws.send(JSON.stringify({
type: 'client.switch-vad-mode',
vad_mode_on: true
}));
// Start streaming microphone audio
await this.startAudioStream();
}
async startAudioStream() {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
channelCount: 1,
sampleRate: 16000,
echoCancellation: true,
noiseSuppression: true
}
});
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
let isFirstChunk = true;
processor.onaudioprocess = (e) => {
const pcmData = this.convertToPCM16(e.inputBuffer.getChannelData(0));
this.sendAudio(pcmData, isFirstChunk);
isFirstChunk = false;
};
source.connect(processor);
processor.connect(audioContext.destination);
}
// Send messages
sendText(text, messageType = 'user-message') {
if (this.ws?.readyState !== WebSocket.OPEN) {
throw new Error('WebSocket not connected');
}
this.ws.send(JSON.stringify({
type: 'client.new-text-message',
text: text,
message_type: messageType
}));
}
sendAudio(audioData, isFirstChunk = false) {
const message = {
type: 'client.new-audio-message',
audio: this.arrayBufferToBase64(audioData)
};
if (isFirstChunk) {
message.audio_config = {
format: 'pcm',
sample_rate: 16000,
sample_width: 2,
n_channels: 1,
frame_rate: 16000
};
}
this.ws.send(JSON.stringify(message));
}
completeAudio() {
this.ws.send(JSON.stringify({
type: 'client.new-audio-message',
audio: null
}));
}
// Audio utilities
convertToPCM16(float32Array) {
const int16Array = new Int16Array(float32Array.length);
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]));
int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}
return int16Array.buffer;
}
arrayBufferToBase64(buffer) {
const bytes = new Uint8Array(buffer);
let binary = '';
bytes.forEach(b => binary += String.fromCharCode(b));
return btoa(binary);
}
base64ToArrayBuffer(base64) {
const binaryString = atob(base64);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return bytes.buffer;
}
// Audio playback
queueAudio(base64Audio) {
const audioBuffer = this.base64ToArrayBuffer(base64Audio);
this.audioQueue.push(audioBuffer);
if (!this.isPlaying) {
this.playNextAudio();
}
}
async playNextAudio() {
if (this.audioQueue.length === 0) {
this.isPlaying = false;
return;
}
this.isPlaying = true;
const audioBuffer = this.audioQueue.shift();
// Play using Web Audio API
const audioContext = new AudioContext();
const source = audioContext.createBufferSource();
// Decode PCM data
const audioData = await audioContext.decodeAudioData(audioBuffer);
source.buffer = audioData;
source.connect(audioContext.destination);
source.onended = () => this.playNextAudio();
source.start();
}
pauseAudio() {
// Clear audio queue when interrupted
this.audioQueue = [];
this.isPlaying = false;
}
// Cleanup
async finish() {
if (this.options.vadEnabled) {
// Disable VAD first
this.ws.send(JSON.stringify({
type: 'client.switch-vad-mode',
vad_mode_on: false
}));
// Wait for confirmation
await new Promise(resolve => {
const handler = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === 'server.vad-mode-switched') {
this.ws.removeEventListener('message', handler);
resolve();
}
};
this.ws.addEventListener('message', handler);
});
}
// Now finish conversation
this.ws.send(JSON.stringify({
type: 'client.finish-conversation'
}));
}
cleanup() {
if (this.keepAliveInterval) {
clearInterval(this.keepAliveInterval);
}
this.audioQueue = [];
this.isPlaying = false;
}
close() {
this.cleanup();
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({ type: 'client.close-connection' }));
this.ws.close();
}
}
}
// Usage Example
async function main() {
const client = new RealtimeConversation('your-org', 'your-auth-token', {
responseFormat: 'voice',
audioFormat: 'pcm',
vadEnabled: true,
onMessage: (msg) => {
// Handle all messages
console.log('Message:', msg.type);
},
onError: (error) => {
console.error('Error:', error);
},
onClose: (event) => {
console.log('Closed:', event.code, event.reason);
}
});
try {
// Connect and start conversation
const conversationId = await client.connect('service-id');
console.log('Conversation ID:', conversationId);
// Send a text message
client.sendText('Hello, how can you help me?');
// Or manually send audio (if not using VAD)
// client.sendAudio(pcmAudioData, true);
// client.completeAudio();
// When done
// await client.finish();
} catch (error) {
console.error('Failed to connect:', error);
}
}
main();
Common Patterns & Troubleshooting
Connection Flow Diagram
1. Connect WebSocket
↓
2. Authenticate (via subprotocol)
↓
3. Initialize Conversation
├── New: client.start-conversation
└── Existing: client.continue-conversation
↓
4. Receive Confirmation
├── server.conversation-created
└── server.conversation-retrieved
↓
5. Exchange Messages
├── Text: client.new-text-message
├── Audio: client.new-audio-message
└── VAD: Continuous streaming
↓
6. Receive Responses
├── server.new-message (chunks)
└── server.interaction-complete
↓
7. Finish/Close
├── client.finish-conversation
└── client.close-connection
Common Issues & Solutions
No audio playback
Audio received but silent
Check audio format matches audio_format
param
Connection drops
Disconnects after 30s
Implement keep-alive with extend-timeout
VAD not working
Speech not detected
Ensure using PCM format, not MP3
Authentication fails
Code 3000 on connect
Check token format: bearer.authorization.amigo.ai.{token}
Conversation locked
Code 4009
Only 1 connection per user/service allowed
Empty transcripts
VAD returns empty text
Check microphone permissions and audio levels
Choppy audio
Broken playback
Buffer audio chunks before playing
High latency
Slow responses
Use regional endpoints, PCM format
Best Practices
Connection Management
Implement reconnection logic for network interruptions
Send periodic
extend-timeout
messages during long idle periodsProperly close connections with
client.close-connection
Audio Streaming
Use PCM format for lowest latency in VAD mode
Stream audio chunks as they become available (don't buffer entire message)
Include
audio_config
only in the first chunk
Error Recovery
Handle WebSocket close events gracefully
Implement exponential backoff for reconnections
Save conversation ID to continue after disconnection
Performance
Reuse WebSocket connections when possible
Process audio chunks immediately upon receipt
Use appropriate audio buffer sizes (typically 20-60ms chunks)
Security
Never expose authentication tokens in client-side code
Use secure WebSocket connections (wss://)
Implement token refresh before expiration
SDK & Framework Support
Current Support
JavaScript/Browser
Full support
Native WebSocket API
Node.js
Full support
Use ws
package
TypeScript SDK
Coming soon
Use WebSocket API directly
Python
Supported
Use websockets
library
React Native
Supported
Built-in WebSocket support
Flutter
Supported
Use web_socket_channel
Framework Examples
Node.js
const WebSocket = require('ws');
const ws = new WebSocket(url, {
headers: {
'Sec-WebSocket-Protocol': `bearer.authorization.amigo.ai.${token}`
}
});
Python
import websockets
import json
async def connect():
url = f"wss://api.amigo.ai/v1/{org}/conversation/converse_realtime"
headers = {
"Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{token}"
}
async with websockets.connect(url, subprotocols=[headers["Sec-WebSocket-Protocol"]]) as ws:
await ws.send(json.dumps({
"type": "client.start-conversation",
"service_id": service_id,
"service_version_set_name": "release"
}))
Related Documentation
Authentication Guide - Set up auth tokens
Voice Conversations (HTTP) - Alternative HTTP approach
Conversation Events - Event streaming details
Regional Endpoints - Optimize latency
Last updated
Was this helpful?