Real-time Voice (WebSocket)
Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.
Quick Start
// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
['bearer.authorization.amigo.ai.' + authToken]
);
// 2. Start a conversation when connected
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'client.start-conversation',
service_id: 'your-service-id',
service_version_set_name: 'release'
}));
};
// 3. Handle incoming messages
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'server.conversation-created') {
console.log('Ready to chat! Conversation ID:', message.conversation_id);
// Now you can send audio or text messages
}
if (message.type === 'server.new-message' && message.message) {
// Handle audio/text response from agent
handleAgentResponse(message.message);
}
};
// 4. Send a text message
ws.send(JSON.stringify({
type: 'client.new-text-message',
text: 'Hello, how can you help me?',
message_type: 'user-message'
}));What You Can Build
Voice Assistants
Natural voice conversations with automatic speech detection
Call Center Agents
Real-time customer support with interruption handling
Interactive Games
Voice-controlled gaming experiences
Healthcare Bots
Medical consultation assistants with voice interaction
Educational Tutors
Interactive learning with voice feedback
Automatic Latency Management
Real-time voice conversations include automatic audio fillers that play during processing delays (e.g., "Let me look that up..."). These enhance user experience by reducing perceived latency without requiring client-side implementation. See Managing Perceived Latency for details.
Key Features
Real-time Streaming
Send and receive audio chunks as they're generated
Voice Activity Detection
Automatic detection of speech start/stop
Low Latency
Sub-second response times with streaming
Interruption Handling
Natural conversation flow management
Audio Fillers
Automatic filler phrases during processing delays
Flexible Audio Formats
PCM (lowest latency) or MP3 (bandwidth-efficient)
External Events
Inject context during conversations
Multi-stream Support
Handle multiple audio streams
Session Management
Continue existing conversations
Connection Setup
Endpoint
Regional Endpoints
Choose the endpoint closest to your users for best performance:
US (default)
wss://api.amigo.ai/v1/{org}/conversation/converse_realtime
CA Central
wss://api-ca-central-1.amigo.ai/v1/{org}/conversation/converse_realtime
EU Central
wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime
AP Southeast
wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime
Query Parameters
response_format
text | voice
Required
Agent response format
voice
audio_format
mp3 | pcm
If voice
Audio encoding:
• pcm: Lower latency, VAD support
• mp3: Bandwidth-efficient
pcm
current_agent_action_type
regex
Optional
Filter agent action events
^tool\..*
Authentication
WebSocket authentication uses the Sec-WebSocket-Protocol header with your bearer token:
Conversation Flow
Step 1: Connect & Authenticate
Step 2: Initialize Conversation
Once connected, you must initialize the conversation:
Option A: Start New Conversation
Option B: Continue Existing Conversation
Step 3: Exchange Messages
Now you can send text or audio messages and receive responses:
Sequence Diagram
Message Reference
Messages You Send (Client → Server)
Send Text
Send Audio
Voice Activity Detection (VAD)
Finish Conversation
Standard Mode
VAD Mode
When in VAD mode, first disable VAD, wait for acknowledgment, then finish:
Graceful Close
Extend Timeout
Messages You Receive (Server → Client)
Conversation Lifecycle
Agent Responses
Voice Activity Detection (VAD) Mode
VAD mode enables hands-free, natural conversations with automatic speech detection.
VAD Requirements
Audio format must be PCM (MP3 not supported)
Continuous audio streaming required
Interruptions automatically handled by pausing agent audio
How VAD Works
VAD with External Events
External events can interrupt ongoing conversations in VAD mode when marked with start_interaction: true:
External Event Behavior in VAD Mode:
When agent hasn't detected user speaking: External event with
start_interaction: trueinterrupts any existing interaction and starts a new interaction immediately with the external eventWhen user is speaking (agent has detected): External event is queued - the agent waits until the user finishes speaking (indicated by
server.vad-speech-ended), then triggers a new interaction with the external event
VAD Requirements
Important:
Audio format: Must use PCM (MP3 not supported in VAD mode)
Streaming: Continuous audio streaming required
Interruptions: Automatically handled by pausing agent audio
Audio Configuration
PCM Format (Best for real-time & VAD)
MP3 Format (Bandwidth-efficient)
Language Support
Supported Languages
The following languages are supported for both voice transcription and synthesis:
English
en
Spanish
es
French
fr
German
de
Italian
it
Portuguese
pt
Polish
pl
Turkish
tr
Russian
ru
Dutch
nl
Czech
cs
Arabic
ar
Chinese
zh
Japanese
ja
Hungarian
hu
Korean
ko
Hindi
hi
Language is determined by: 1) User's
preferred_languagesetting, 2) Agent'sdefault_spoken_languagefallback
Error Handling
WebSocket Close Codes
3000
Unauthorized
Invalid/expired token
Refresh auth token
3003
Forbidden
Missing permissions
Check user permissions
3008
Timeout
No activity for 30s
Send extend-timeout every 15s
4000
Bad Request
Invalid message format
Check message structure
4004
Not Found
Service/conversation doesn't exist
Verify IDs
4009
Conflict
Conversation locked/finished
Check conversation state
4015
Unsupported Media
Wrong audio format
Use PCM for VAD, check config
4029
Rate Limited
Too many messages
Implement backoff, max 60/min
Error Handling Example
Performance & Limits
Rate Limits
Messages/minute
60
Includes all message types
Connection timeout
30 seconds
Reset by any message
Keep-alive interval
15 seconds
Send extend-timeout
Concurrent connections
1 per user/service
One active connection at a time
Audio chunk size
20-60ms
Optimal for real-time streaming
Max message size
1MB
For audio chunks
Keep Connection Alive
Complete Implementation
Production-Ready WebSocket Client
Common Patterns & Troubleshooting
Connection Flow Diagram
Common Issues & Solutions
No audio playback
Audio received but silent
Check audio format matches audio_format param
Connection drops
Disconnects after 30s
Implement keep-alive with extend-timeout
VAD not working
Speech not detected
Ensure using PCM format, not MP3
Authentication fails
Code 3000 on connect
Check token format: bearer.authorization.amigo.ai.{token}
Conversation locked
Code 4009
Only 1 connection per user/service allowed
Empty transcripts
VAD returns empty text
Check microphone permissions and audio levels
Choppy audio
Broken playback
Buffer audio chunks before playing
High latency
Slow responses
Use regional endpoints, PCM format
Best Practices
Connection Management
Implement reconnection logic for network interruptions
Send periodic
extend-timeoutmessages during long idle periodsProperly close connections with
client.close-connection
Audio Streaming
Use PCM format for lowest latency in VAD mode
Stream audio chunks as they become available (don't buffer entire message)
Include
audio_configonly in the first chunk
Error Recovery
Handle WebSocket close events gracefully
Implement exponential backoff for reconnections
Save conversation ID to continue after disconnection
Performance
Reuse WebSocket connections when possible
Process audio chunks immediately upon receipt
Use appropriate audio buffer sizes (typically 20-60ms chunks)
Security
Never expose authentication tokens in client-side code
Use secure WebSocket connections (wss://)
Implement token refresh before expiration
SDK & Framework Support
Current Support
JavaScript/Browser
Full support
Native WebSocket API
Node.js
Full support
Use ws package
TypeScript SDK
Coming soon
Use WebSocket API directly
Python
Supported
Use websockets library
React Native
Supported
Built-in WebSocket support
Flutter
Supported
Use web_socket_channel
Framework Examples
Node.js
Python
Related Documentation
Authentication Guide - Set up auth tokens
Voice Conversations (HTTP) - Alternative HTTP approach
Conversation Events - Event streaming details
Regional Endpoints - Optimize latency
Last updated
Was this helpful?

