# Real-time Voice (WebSocket)

Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.

{% hint style="info" %}
**Real-time Capabilities**\
This API supports sub-second latency voice conversations with automatic speech detection, interruption handling, and streaming responses.
{% endhint %}

{% hint style="info" %}
**Phone-based voice**: this is WebSocket streaming for text-based apps. For enterprise phone calls, see [Platform API: Voice Agent](https://docs.amigo.ai/developer-guide/platform-api/platform-api/voice-agent).
{% endhint %}

## Quick Start

{% tabs %}
{% tab title="JavaScript" %}

```javascript
// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]
);

// 2. Start a conversation when connected
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'client.start-conversation',
    service_id: 'your-service-id',
    service_version_set_name: 'release'
  }));
};

// 3. Handle incoming messages
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'server.conversation-created') {
    console.log('Ready to chat! Conversation ID:', message.conversation_id);
    // Now you can send audio or text messages
  }
  
  if (message.type === 'server.new-message' && message.message) {
    // Handle audio/text response from agent
    handleAgentResponse(message.message);
  }
};

// 4. Send a text message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));
```

{% endtab %}

{% tab title="Python" %}

```python
import websocket
import json
import base64

# 1. Create connection with authentication
ws_url = f"wss://api.amigo.ai/v1/{org_id}/conversation/converse_realtime?response_format=voice&audio_format=pcm"
headers = {"Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{auth_token}"}

ws = websocket.WebSocketApp(ws_url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
    on_error=on_error,
    on_close=on_close)

def on_open(ws):
    # 2. Start conversation
    ws.send(json.dumps({
        "type": "client.start-conversation",
        "service_id": "your-service-id",
        "service_version_set_name": "release"
    }))

def on_message(ws, message):
    # 3. Handle responses
    msg = json.loads(message)
    
    if msg["type"] == "server.conversation-created":
        print(f"Ready! ID: {msg['conversation_id']}")
        
        # 4. Send a message
        ws.send(json.dumps({
            "type": "client.new-text-message",
            "text": "Hello!",
            "message_type": "user-message"
        }))

ws.run_forever()
```

{% endtab %}
{% endtabs %}

## What You Can Build

| Use Case               | Description                                                 |
| ---------------------- | ----------------------------------------------------------- |
| **Voice Assistants**   | Natural voice conversations with automatic speech detection |
| **Call Center Agents** | Real-time customer support with interruption handling       |
| **Interactive Games**  | Voice-controlled gaming experiences                         |
| **Healthcare Bots**    | Medical consultation assistants with voice interaction      |
| **Educational Tutors** | Interactive learning with voice feedback                    |

{% hint style="success" %}
**Automatic Latency Management**

Real-time voice conversations include automatic audio fillers that play during processing delays (for example, "Let me look that up..."). These improve user experience by reducing perceived latency without any client-side work. See [Managing Perceived Latency](https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations-voice#managing-perceived-latency) for details.
{% endhint %}

## Key Features

{% tabs %}
{% tab title="Core Features" %}

| Feature                      | Description                                         |
| ---------------------------- | --------------------------------------------------- |
| **Real-time Streaming**      | Send and receive audio chunks as they are generated |
| **Voice Activity Detection** | Automatic detection of speech start and stop        |
| **Low Latency**              | Sub-second response times with streaming            |
| **Interruption Handling**    | Natural conversation flow management                |
| **Audio Fillers**            | Automatic filler phrases during processing delays   |
| {% endtab %}                 |                                                     |

{% tab title="Advanced Features" %}

| Feature                    | Description                                       |
| -------------------------- | ------------------------------------------------- |
| **Flexible Audio Formats** | PCM (lowest latency) or MP3 (bandwidth-efficient) |
| **External Events**        | Inject context during conversations               |
| **Multi-stream Support**   | Handle multiple audio streams                     |
| **Session Management**     | Continue existing conversations                   |
| {% endtab %}               |                                                   |
| {% endtabs %}              |                                                   |

## Connection Setup

### Endpoint

```
wss://api.amigo.ai/v1/{organization}/conversation/converse_realtime
```

### Regional Endpoints

Choose the endpoint closest to your users for best performance:

| Region           | Endpoint                                                                    |
| ---------------- | --------------------------------------------------------------------------- |
| **US** (default) | `wss://api.amigo.ai/v1/{org}/conversation/converse_realtime`                |
| **CA Central**   | `wss://api-ca-central-1.amigo.ai/v1/{org}/conversation/converse_realtime`   |
| **EU Central**   | `wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime`   |
| **AP Southeast** | `wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime` |

### Query Parameters

| Parameter                   | Type              | Required | Description                                                                                                         | Example     |
| --------------------------- | ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------- | ----------- |
| `response_format`           | `text` \| `voice` | Required | Agent response format                                                                                               | `voice`     |
| `audio_format`              | `mp3` \| `pcm`    | If voice | <p>Audio encoding:<br>• <code>pcm</code>: lower latency, VAD support<br>• <code>mp3</code>: bandwidth-efficient</p> | `pcm`       |
| `current_agent_action_type` | regex             | Optional | Filter agent action events                                                                                          | `^tool\..*` |

## Authentication

WebSocket authentication uses the `Sec-WebSocket-Protocol` header with your bearer token:

```javascript
// Get your auth token (from login or API key)
const authToken = await getAuthToken();

// Pass token as WebSocket subprotocol
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]  // ← Token goes here
);
```

{% hint style="info" %}
**Token Format**\
The token format is `bearer.authorization.amigo.ai.` + your JWT token. This is passed as a WebSocket subprotocol, not a header.
{% endhint %}

## Conversation Flow

### Step 1: Connect & Authenticate

```javascript
const ws = new WebSocket(
  `wss://api.amigo.ai/v1/${orgId}/conversation/converse_realtime?response_format=voice&audio_format=pcm`,
  [`bearer.authorization.amigo.ai.${authToken}`]
);

// Handle connection events
ws.onopen = () => console.log('Connected');
ws.onerror = (error) => console.error('Connection error:', error);
ws.onclose = (event) => console.log('Disconnected:', event.code, event.reason);
```

### Step 2: Initialize Conversation

Once connected, you must initialize the conversation:

#### Option A: Start New Conversation

```javascript
ws.send(JSON.stringify({
  type: 'client.start-conversation',
  service_id: 'your-service-id',        // Your agent service ID
  service_version_set_name: 'release'   // 'release', 'edge', or custom
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-created', conversation_id: '...' }
```

#### Option B: Continue Existing Conversation

```javascript
ws.send(JSON.stringify({
  type: 'client.continue-conversation',
  conversation_id: 'existing-conversation-id'
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-retrieved' }
```

### Step 3: Exchange Messages

Now you can send text or audio messages and receive responses:

### Sequence Diagram

{% @mermaid/diagram content="%%{init: {"theme": "base", "themeVariables": {"actorBkg": "#083241", "actorTextColor": "#FFFFFF", "actorBorder": "#083241", "signalColor": "#575452", "signalTextColor": "#100F0F", "labelBoxBkgColor": "#F1EAE7", "labelBoxBorderColor": "#D7D2D0", "labelTextColor": "#100F0F", "loopTextColor": "#100F0F", "noteBkgColor": "#F1EAE7", "noteBorderColor": "#D7D2D0", "noteTextColor": "#100F0F", "activationBkgColor": "#E8E2EB", "activationBorderColor": "#083241", "altSectionBkgColor": "#F1EAE7", "altSectionColor": "#100F0F"}}}%%
sequenceDiagram
autonumber
participant C as Client (Browser/App)
participant S as Server (WebSocket Endpoint)

```
Note over C,S: WebSocket connection established (wss://...).

C->>S: StartConversation request
S-->>C: Ack { request:"StartConversation" }

C->>S: TriggerFirstMessage request
loop Agent TTS audio stream
    S-->>C: AgentAudio { pcm, timestamps }
  end
  S-->>C: InteractionComplete { interaction_id, message, ... }

C->>S: SwitchVADMode { vad_mode_on: true }
S-->>C: Ack { request:"SetVADMode" }

rect rgba(0,0,0,0.03)
  loop Continuous upstream audio
    C-->>S: AudioChunk { pcm16 }
  end
end

Note over S: Server-side VAD monitors incoming audio

S-->>C: Event:SpeechStarted { ts_start }
Note over C: Pause any local/agent audio<br/>playback immediately

%% (Optional) If client keeps sending audio, server may continue consuming; VAD tracks speech state.

S-->>C: Event:SpeechEnded { ts_end }

par Server generates agent response
  Note over S: ASR/LLM/TTS pipeline runs
and Downlink agent audio
  loop Agent TTS audio stream
    S-->>C: AgentAudio { pcm, timestamps }
  end
  S-->>C: InteractionComplete { interaction_id, message, ... }
end

%% External event interruption in VAD mode
Note over C,S: External event interruption (VAD mode)
C->>S: client.new-text-message<br/>{ message_type: 'external-event',<br/>start_interaction: true }
alt Agent hasn't detected user speaking
    Note over S: Interrupts existing<br/>interaction immediately
    S-->>C: server.new-message (response to event)
    S-->>C: server.interaction-complete
else User is speaking (agent has detected)
    Note over S: Waits until user<br/>finishes speaking
    S-->>C: server.vad-speech-ended
    S-->>C: server.new-message (response to event)
    S-->>C: server.interaction-complete
end" %}
```

## Message Reference

### Messages You Send (Client → Server)

#### Send Text

```javascript
// User message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));

// System event (e.g., user actions, context)
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'User navigated to checkout page',
  message_type: 'external-event',
  start_interaction: false  // Optional: default is false
}));

// Urgent external event that starts new interaction (VAD mode)
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Critical system alert',
  message_type: 'external-event',
  start_interaction: true  // Interrupts current interaction in VAD mode
}));
```

#### Send Audio

```javascript
// First chunk - include config
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: base64AudioChunk,  // Base64 encoded PCM audio
  audio_config: {
    format: 'pcm',
    sample_rate: 16000,     // 16kHz
    sample_width: 2,        // 16-bit
    n_channels: 1,          // Mono
    frame_rate: 16000
  }
}));

// Subsequent chunks - no config needed
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: nextBase64AudioChunk
}));

// Signal end of audio
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: null
}));
```

#### Voice Activity Detection (VAD)

```javascript
// Enable automatic speech detection
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// In VAD mode: continuously stream audio, server detects speech
// Server will automatically determine when user starts/stops speaking
```

#### Finish Conversation

**Standard Mode**

```json
{
  "type": "client.finish-conversation"
}
```

**VAD Mode**

When in VAD mode, first disable VAD, wait for acknowledgment, then finish:

```json
// Step 1: Disable VAD mode
{
  "type": "client.switch-vad-mode",
  "vad_mode_on": false
}

// Wait for response (may take up to 10 seconds)
{
  "type": "server.vad-mode-switched",
  "current_vad_mode_on": false
}

// Step 2: Finish conversation
{
  "type": "client.finish-conversation"
}

// Response
{
  "type": "server.conversation-completed"
}
```

#### Graceful Close

```json
{
  "type": "client.close-connection"
}
```

#### Extend Timeout

```json
{
  "type": "client.extend-timeout"
}
```

### Messages You Receive (Server → Client)

#### Conversation Lifecycle

```javascript
// Conversation created
{
  type: 'server.conversation-created',
  conversation_id: '507f1f77bcf86cd799439012'
}

// Conversation retrieved (when continuing)
{
  type: 'server.conversation-retrieved'
}

// Conversation finished
{
  type: 'server.conversation-completed'
}
```

#### Agent Responses

````javascript
// Text response chunk
{
  type: 'server.new-message',
  interaction_id: '...',
  message: 'Hello! I can help you with...',  // Text chunk
  message_metadata: [],
  transcript_alignment: null,
  stop: false,              // false = more chunks coming
  sequence_number: 1,
  message_id: '...'
}

// Audio response chunk
{
  type: 'server.new-message',
  interaction_id: '...',
  message: 'base64_audio_chunk',             // Base64 PCM audio
  message_metadata: [],
  transcript_alignment: [                    // Timing for each character (ms)
    [0, 'H'], [100, 'e'], [200, 'l'], [300, 'l'], [400, 'o']
  ],
  stop: false,
  sequence_number: 1,
  message_id: '...'
}

// Response complete
{
  type: 'server.interaction-complete',
  message_id: '...',
  interaction_id: '...',
  full_message: 'Complete text or transcript',
  conversation_completed: false    // true = agent ended conversation
}

#### Voice Activity Detection Events
```javascript
// User started speaking
{
  type: 'server.vad-speech-started',
  start: 1.234  // Seconds from last reset
}

// User stopped speaking (with transcript)
{
  type: 'server.vad-speech-ended',
  transcript: 'What the user said',
  start: 1.234,  // When speech started (seconds)
  end: 3.456     // When speech ended (seconds)
}

// Time reference reset
{
  type: 'server.vad-speech-reset-zero',
  timestamp: 0.0  // New zero point for timing
}

// VAD mode changed
{
  type: 'server.vad-mode-switched',
  current_vad_mode_on: true  // Current VAD state
}
````

## Voice Activity Detection (VAD) Mode

VAD mode enables hands-free, natural conversations with automatic speech detection.

{% hint style="warning" %}
**VAD Requirements**

* Audio format must be PCM (MP3 is not supported).
* Continuous audio streaming is required.
* Interruptions are handled automatically by pausing agent audio.
  {% endhint %}

### How VAD Works

```javascript
// 1. Enable VAD mode
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// 2. Stream audio continuously (server detects speech)
const streamAudio = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  // ... convert to PCM and send chunks continuously
};

// 3. Handle VAD events
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch(msg.type) {
    case 'server.vad-speech-started':
      console.log('User speaking...');
      pauseAgentAudio();  // Stop agent playback if speaking
      break;

    case 'server.vad-speech-ended':
      console.log('User said:', msg.transcript);
      // Agent automatically responds
      break;
  }
};
```

### VAD with External Events

External events can interrupt ongoing conversations in VAD mode when marked with `start_interaction: true`:

```javascript
// Enable VAD mode first
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// Send external event that can interrupt
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({
    event: 'payment.failed',
    amount: 100.00,
    error: 'Card declined'
  }),
  start_interaction: true  // Triggers interruption behavior
}));

// Handle the response
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch(msg.type) {
    case 'server.new-message':
      // This will be the agent's response to the external event
      console.log('Agent response to event:', msg.message);
      break;

    case 'server.interaction-complete':
      console.log('External event interaction completed');
      break;
  }
};
```

**External Event Behavior in VAD Mode:**

* **When the agent hasn't detected the user speaking**: an external event with `start_interaction: true` interrupts any existing interaction and starts a new interaction immediately with the external event.
* **When the user is speaking (agent has detected)**: the external event is queued. The agent waits until the user finishes speaking (indicated by `server.vad-speech-ended`), then triggers a new interaction with the external event.

### VAD Requirements

**Important**:

* **Audio format**: must use PCM (MP3 not supported in VAD mode).
* **Streaming**: continuous audio streaming required.
* **Interruptions**: handled automatically by pausing agent audio.

## Audio Configuration

### PCM Format (Best for real-time & VAD)

```javascript
const pcmConfig = {
  format: 'pcm',
  sample_rate: 16000,   // 16 kHz
  sample_width: 2,      // 16-bit
  n_channels: 1,        // Mono
  frame_rate: 16000
};

// Example: Convert Web Audio API to PCM
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

processor.onaudioprocess = (e) => {
  const pcmData = convertToPCM16(e.inputBuffer.getChannelData(0));
  ws.send(JSON.stringify({
    type: 'client.new-audio-message',
    audio: btoa(String.fromCharCode(...pcmData))
  }));
};
```

### MP3 Format (Bandwidth-efficient)

```javascript
const mp3Config = {
  format: 'mp3',
  bit_rate: 128000,    // 128 kbps
  sample_rate: 44100,  // 44.1 kHz  
  n_channels: 2        // Stereo
};
```

## Language Support

### Supported Languages

The following languages are supported for both voice transcription and synthesis:

| Language   | Code |
| ---------- | ---- |
| English    | `en` |
| Spanish    | `es` |
| French     | `fr` |
| German     | `de` |
| Italian    | `it` |
| Portuguese | `pt` |
| Polish     | `pl` |
| Turkish    | `tr` |
| Russian    | `ru` |
| Dutch      | `nl` |
| Czech      | `cs` |
| Arabic     | `ar` |
| Chinese    | `zh` |
| Japanese   | `ja` |
| Hungarian  | `hu` |
| Korean     | `ko` |
| Hindi      | `hi` |

> Language is determined by: 1) User's `preferred_language` setting, 2) Agent's `default_spoken_language` fallback

## Error Handling

### WebSocket Close Codes

| Code     | Error             | Common Cause                       | Solution                        |
| -------- | ----------------- | ---------------------------------- | ------------------------------- |
| **3000** | Unauthorized      | Invalid/expired token              | Refresh auth token              |
| **3003** | Forbidden         | Missing permissions                | Check user permissions          |
| **3008** | Timeout           | No activity for 30s                | Send `extend-timeout` every 15s |
| **4000** | Bad Request       | Invalid message format             | Check message structure         |
| **4004** | Not Found         | Service/conversation doesn't exist | Verify IDs                      |
| **4009** | Conflict          | Conversation locked/finished       | Check conversation state        |
| **4015** | Unsupported Media | Wrong audio format                 | Use PCM for VAD, check config   |
| **4029** | Rate Limited      | Too many messages                  | Implement backoff, max 60/min   |

### Error Handling Example

```javascript
ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  switch(event.code) {
    case 3000:
      // Refresh token and reconnect
      await refreshAuthToken();
      reconnect();
      break;
    case 3008:
      // Connection timed out
      console.log('Connection timed out - forgot to send keep-alive?');
      break;
    case 4029:
      // Rate limited - implement exponential backoff
      setTimeout(() => reconnect(), backoffDelay);
      break;
    default:
      console.error(`Connection closed: ${event.code} - ${event.reason}`);
  }
};
```

## Performance & Limits

### Rate Limits

| Limit                      | Value              | Notes                           |
| -------------------------- | ------------------ | ------------------------------- |
| **Messages/minute**        | 60                 | Includes all message types      |
| **Connection timeout**     | 30 seconds         | Reset by any message            |
| **Keep-alive interval**    | 15 seconds         | Send `extend-timeout`           |
| **Concurrent connections** | 1 per user/service | One active connection at a time |
| **Audio chunk size**       | 20-60ms            | Optimal for real-time streaming |
| **Max message size**       | 1MB                | For audio chunks                |

### Keep Connection Alive

```javascript
// Send keep-alive every 15 seconds
const keepAlive = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
  }
}, 15000);

// Clean up on close
ws.onclose = () => clearInterval(keepAlive);
```

## Complete Implementation

### Production-Ready WebSocket Client

```javascript
class RealtimeConversation {
  constructor(orgId, authToken, options = {}) {
    this.orgId = orgId;
    this.authToken = authToken;
    this.ws = null;
    this.keepAliveInterval = null;
    this.audioQueue = [];
    this.isPlaying = false;
    
    // Configuration
    this.options = {
      responseFormat: options.responseFormat || 'voice',
      audioFormat: options.audioFormat || 'pcm',
      vadEnabled: options.vadEnabled || false,
      onMessage: options.onMessage || (() => {}),
      onError: options.onError || console.error,
      onClose: options.onClose || (() => {})
    };
  }

  async connect(serviceId) {
    const url = `wss://api.amigo.ai/v1/${this.orgId}/conversation/converse_realtime` +
                `?response_format=${this.options.responseFormat}` +
                `&audio_format=${this.options.audioFormat}`;
    
    this.ws = new WebSocket(url, [`bearer.authorization.amigo.ai.${this.authToken}`]);
    
    return new Promise((resolve, reject) => {
      this.ws.onopen = () => {
        console.log('WebSocket connected');
        
        // Start keep-alive
        this.startKeepAlive();
        
        // Initialize conversation
        this.ws.send(JSON.stringify({
          type: 'client.start-conversation',
          service_id: serviceId,
          service_version_set_name: 'release'
        }));
      };
      
      this.ws.onmessage = (event) => {
        const message = JSON.parse(event.data);
        this.handleMessage(message);
        
        if (message.type === 'server.conversation-created') {
          resolve(message.conversation_id);
          
          // Enable VAD if requested
          if (this.options.vadEnabled) {
            this.enableVAD();
          }
        }
      };
      
      this.ws.onerror = (error) => {
        this.options.onError(error);
        reject(error);
      };
      
      this.ws.onclose = (event) => {
        this.cleanup();
        this.options.onClose(event);
      };
    });
  }
  
  handleMessage(message) {
    this.options.onMessage(message);
    
    switch(message.type) {
      case 'server.conversation-created':
        console.log('Conversation started:', message.conversation_id);
        break;
        
      case 'server.new-message':
        if (this.options.responseFormat === 'voice' && message.message) {
          this.queueAudio(message.message);
        }
        break;
        
      case 'server.interaction-complete':
        console.log('Response complete');
        break;
        
      case 'server.vad-speech-started':
        console.log('User speaking...');
        this.pauseAudio();
        break;
        
      case 'server.vad-speech-ended':
        console.log('User said:', message.transcript);
        break;
    }
  }
  
  // Keep connection alive
  startKeepAlive() {
    this.keepAliveInterval = setInterval(() => {
      if (this.ws?.readyState === WebSocket.OPEN) {
        this.ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
      }
    }, 15000);
  }
  
  // Voice Activity Detection
  async enableVAD() {
    this.ws.send(JSON.stringify({
      type: 'client.switch-vad-mode',
      vad_mode_on: true
    }));
    
    // Start streaming microphone audio
    await this.startAudioStream();
  }
  
  async startAudioStream() {
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        channelCount: 1,
        sampleRate: 16000,
        echoCancellation: true,
        noiseSuppression: true
      }
    });
    
    const audioContext = new AudioContext({ sampleRate: 16000 });
    const source = audioContext.createMediaStreamSource(stream);
    const processor = audioContext.createScriptProcessor(4096, 1, 1);
    
    let isFirstChunk = true;
    processor.onaudioprocess = (e) => {
      const pcmData = this.convertToPCM16(e.inputBuffer.getChannelData(0));
      this.sendAudio(pcmData, isFirstChunk);
      isFirstChunk = false;
    };
    
    source.connect(processor);
    processor.connect(audioContext.destination);
  }
  
  // Send messages
  sendText(text, messageType = 'user-message') {
    if (this.ws?.readyState !== WebSocket.OPEN) {
      throw new Error('WebSocket not connected');
    }
    
    this.ws.send(JSON.stringify({
      type: 'client.new-text-message',
      text: text,
      message_type: messageType
    }));
  }
  
  sendAudio(audioData, isFirstChunk = false) {
    const message = {
      type: 'client.new-audio-message',
      audio: this.arrayBufferToBase64(audioData)
    };
    
    if (isFirstChunk) {
      message.audio_config = {
        format: 'pcm',
        sample_rate: 16000,
        sample_width: 2,
        n_channels: 1,
        frame_rate: 16000
      };
    }
    
    this.ws.send(JSON.stringify(message));
  }
  
  completeAudio() {
    this.ws.send(JSON.stringify({
      type: 'client.new-audio-message',
      audio: null
    }));
  }
  
  // Audio utilities
  convertToPCM16(float32Array) {
    const int16Array = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]));
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return int16Array.buffer;
  }
  
  arrayBufferToBase64(buffer) {
    const bytes = new Uint8Array(buffer);
    let binary = '';
    bytes.forEach(b => binary += String.fromCharCode(b));
    return btoa(binary);
  }
  
  base64ToArrayBuffer(base64) {
    const binaryString = atob(base64);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes.buffer;
  }
  
  // Audio playback
  queueAudio(base64Audio) {
    const audioBuffer = this.base64ToArrayBuffer(base64Audio);
    this.audioQueue.push(audioBuffer);
    
    if (!this.isPlaying) {
      this.playNextAudio();
    }
  }
  
  async playNextAudio() {
    if (this.audioQueue.length === 0) {
      this.isPlaying = false;
      return;
    }
    
    this.isPlaying = true;
    const audioBuffer = this.audioQueue.shift();
    
    // Play using Web Audio API
    const audioContext = new AudioContext();
    const source = audioContext.createBufferSource();
    
    // Decode PCM data
    const audioData = await audioContext.decodeAudioData(audioBuffer);
    source.buffer = audioData;
    source.connect(audioContext.destination);
    source.onended = () => this.playNextAudio();
    source.start();
  }
  
  pauseAudio() {
    // Clear audio queue when interrupted
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  // Cleanup
  async finish() {
    if (this.options.vadEnabled) {
      // Disable VAD first
      this.ws.send(JSON.stringify({
        type: 'client.switch-vad-mode',
        vad_mode_on: false
      }));
      
      // Wait for confirmation
      await new Promise(resolve => {
        const handler = (event) => {
          const msg = JSON.parse(event.data);
          if (msg.type === 'server.vad-mode-switched') {
            this.ws.removeEventListener('message', handler);
            resolve();
          }
        };
        this.ws.addEventListener('message', handler);
      });
    }
    
    // Now finish conversation
    this.ws.send(JSON.stringify({
      type: 'client.finish-conversation'
    }));
  }
  
  cleanup() {
    if (this.keepAliveInterval) {
      clearInterval(this.keepAliveInterval);
    }
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  close() {
    this.cleanup();
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'client.close-connection' }));
      this.ws.close();
    }
  }
}

// Usage Example
async function main() {
  const client = new RealtimeConversation('your-org', 'your-auth-token', {
    responseFormat: 'voice',
    audioFormat: 'pcm',
    vadEnabled: true,
    onMessage: (msg) => {
      // Handle all messages
      console.log('Message:', msg.type);
    },
    onError: (error) => {
      console.error('Error:', error);
    },
    onClose: (event) => {
      console.log('Closed:', event.code, event.reason);
    }
  });
  
  try {
    // Connect and start conversation
    const conversationId = await client.connect('service-id');
    console.log('Conversation ID:', conversationId);
    
    // Send a text message
    client.sendText('Hello, how can you help me?');
    
    // Or manually send audio (if not using VAD)
    // client.sendAudio(pcmAudioData, true);
    // client.completeAudio();
    
    // When done
    // await client.finish();
    
  } catch (error) {
    console.error('Failed to connect:', error);
  }
}

main();
```

## Common Patterns & Troubleshooting

### Connection Flow Diagram

```
1. Connect WebSocket
   ↓
2. Authenticate (via subprotocol)
   ↓
3. Initialize Conversation
   ├── New: client.start-conversation
   └── Existing: client.continue-conversation
   ↓
4. Receive Confirmation
   ├── server.conversation-created
   └── server.conversation-retrieved
   ↓
5. Exchange Messages
   ├── Text: client.new-text-message
   ├── Audio: client.new-audio-message
   └── VAD: Continuous streaming
   ↓
6. Receive Responses
   ├── server.new-message (chunks)
   └── server.interaction-complete
   ↓
7. Finish/Close
   ├── client.finish-conversation
   └── client.close-connection
```

### Common Issues and Solutions

| Issue                    | Symptom                   | Solution                                                    |
| ------------------------ | ------------------------- | ----------------------------------------------------------- |
| **No audio playback**    | Audio received but silent | Check audio format matches `audio_format` param             |
| **Connection drops**     | Disconnects after 30s     | Implement keep-alive with `extend-timeout`                  |
| **VAD not working**      | Speech not detected       | Make sure you are using PCM format, not MP3                 |
| **Authentication fails** | Code 3000 on connect      | Check token format: `bearer.authorization.amigo.ai.{token}` |
| **Conversation locked**  | Code 4009                 | Only one connection per user/service is allowed             |
| **Empty transcripts**    | VAD returns empty text    | Check microphone permissions and audio levels               |
| **Choppy audio**         | Broken playback           | Buffer audio chunks before playing                          |
| **High latency**         | Slow responses            | Use regional endpoints and PCM format                       |

## Best Practices

1. **Connection Management**
   * Implement reconnection logic for network interruptions.
   * Send periodic `extend-timeout` messages during long idle periods.
   * Close connections properly with `client.close-connection`.
2. **Audio Streaming**
   * Use PCM format for the lowest latency in VAD mode.
   * Stream audio chunks as they become available rather than buffering the whole message.
   * Include `audio_config` only in the first chunk.
3. **Error Recovery**
   * Handle WebSocket close events gracefully.
   * Implement exponential backoff for reconnections.
   * Save the conversation ID so you can continue after disconnection.
4. **Performance**
   * Reuse WebSocket connections when possible.
   * Process audio chunks immediately on receipt.
   * Use appropriate audio buffer sizes (typically 20-60ms chunks).
5. **Security**
   * Never expose authentication tokens in client-side code.
   * Use secure WebSocket connections (wss\://).
   * Refresh tokens before they expire.

## SDK & Framework Support

### Current Support

| Platform               | Status            | Notes                      |
| ---------------------- | ----------------- | -------------------------- |
| **JavaScript/Browser** | Full support      | Native WebSocket API       |
| **Node.js**            | Full support      | Use `ws` package           |
| **TypeScript SDK**     | Not yet available | Use WebSocket API directly |
| **Python**             | Supported         | Use `websockets` library   |
| **React Native**       | Supported         | Built-in WebSocket support |
| **Flutter**            | Supported         | Use `web_socket_channel`   |

### Framework Examples

#### Node.js

```javascript
const WebSocket = require('ws');
const ws = new WebSocket(url, {
  headers: {
    'Sec-WebSocket-Protocol': `bearer.authorization.amigo.ai.${token}`
  }
});
```

#### Python

```python
import websockets
import json

async def connect():
    url = f"wss://api.amigo.ai/v1/{org}/conversation/converse_realtime"
    headers = {
        "Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{token}"
    }
    async with websockets.connect(url, subprotocols=[headers["Sec-WebSocket-Protocol"]]) as ws:
        await ws.send(json.dumps({
            "type": "client.start-conversation",
            "service_id": service_id,
            "service_version_set_name": "release"
        }))
```

## Related Documentation

* [Authentication Guide](https://docs.amigo.ai/developer-guide/getting-started/authentication): set up auth tokens
* [Voice Conversations (HTTP)](https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations/conversations-voice): alternative HTTP approach
* [Conversation Events](https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations/conversations-events): event streaming details
* [Regional Endpoints](https://docs.amigo.ai/developer-guide/getting-started/regions-and-endpoints): optimize latency
