# Real-time Voice (WebSocket)

Build natural, real-time voice conversations with your Amigo agents using WebSocket connections for low-latency, bidirectional audio streaming.

{% hint style="info" %}
**Real-time Capabilities**\
This API supports sub-second latency voice conversations with automatic speech detection, interruption handling, and streaming responses.
{% endhint %}

{% hint style="info" %}
**Phone-based voice**: this is WebSocket streaming for text-based apps. For enterprise phone calls, see [Platform API: Voice Agent](/developer-guide/platform-api/platform-api/voice-agent.md).
{% endhint %}

## Quick Start

{% tabs %}
{% tab title="JavaScript" %}

```javascript
// 1. Connect to WebSocket with authentication
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]
);

// 2. Start a conversation when connected
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'client.start-conversation',
    service_id: 'your-service-id',
    service_version_set_name: 'release'
  }));
};

// 3. Handle incoming messages
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'server.conversation-created') {
    console.log('Ready to chat! Conversation ID:', message.conversation_id);
    // Now you can send audio or text messages
  }
  
  if (message.type === 'server.new-message' && message.message) {
    // Handle audio/text response from agent
    handleAgentResponse(message.message);
  }
};

// 4. Send a text message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));
```

{% endtab %}

{% tab title="Python" %}

```python
import websocket
import json
import base64

# 1. Create connection with authentication
ws_url = f"wss://api.amigo.ai/v1/{org_id}/conversation/converse_realtime?response_format=voice&audio_format=pcm"
headers = {"Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{auth_token}"}

ws = websocket.WebSocketApp(ws_url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
    on_error=on_error,
    on_close=on_close)

def on_open(ws):
    # 2. Start conversation
    ws.send(json.dumps({
        "type": "client.start-conversation",
        "service_id": "your-service-id",
        "service_version_set_name": "release"
    }))

def on_message(ws, message):
    # 3. Handle responses
    msg = json.loads(message)
    
    if msg["type"] == "server.conversation-created":
        print(f"Ready! ID: {msg['conversation_id']}")
        
        # 4. Send a message
        ws.send(json.dumps({
            "type": "client.new-text-message",
            "text": "Hello!",
            "message_type": "user-message"
        }))

ws.run_forever()
```

{% endtab %}
{% endtabs %}

## What You Can Build

| Use Case               | Description                                                 |
| ---------------------- | ----------------------------------------------------------- |
| **Voice Assistants**   | Natural voice conversations with automatic speech detection |
| **Call Center Agents** | Real-time customer support with interruption handling       |
| **Interactive Games**  | Voice-controlled gaming experiences                         |
| **Healthcare Bots**    | Medical consultation assistants with voice interaction      |
| **Educational Tutors** | Interactive learning with voice feedback                    |

{% hint style="success" %}
**Automatic Latency Management**

Real-time voice conversations include automatic audio fillers that play during processing delays (for example, "Let me look that up..."). These improve user experience by reducing perceived latency without any client-side work. See [Managing Perceived Latency](/developer-guide/classic-api/core-api/conversations/conversations-voice.md#managing-perceived-latency) for details.
{% endhint %}

## Key Features

{% tabs %}
{% tab title="Core Features" %}

| Feature                      | Description                                         |
| ---------------------------- | --------------------------------------------------- |
| **Real-time Streaming**      | Send and receive audio chunks as they are generated |
| **Voice Activity Detection** | Automatic detection of speech start and stop        |
| **Low Latency**              | Sub-second response times with streaming            |
| **Interruption Handling**    | Natural conversation flow management                |
| **Audio Fillers**            | Automatic filler phrases during processing delays   |
| {% endtab %}                 |                                                     |

{% tab title="Advanced Features" %}

| Feature                    | Description                                       |
| -------------------------- | ------------------------------------------------- |
| **Flexible Audio Formats** | PCM (lowest latency) or MP3 (bandwidth-efficient) |
| **External Events**        | Inject context during conversations               |
| **Multi-stream Support**   | Handle multiple audio streams                     |
| **Session Management**     | Continue existing conversations                   |
| {% endtab %}               |                                                   |
| {% endtabs %}              |                                                   |

## Connection Setup

### Endpoint

```
wss://api.amigo.ai/v1/{organization}/conversation/converse_realtime
```

### Regional Endpoints

Choose the endpoint closest to your users for best performance:

| Region           | Endpoint                                                                    |
| ---------------- | --------------------------------------------------------------------------- |
| **US** (default) | `wss://api.amigo.ai/v1/{org}/conversation/converse_realtime`                |
| **CA Central**   | `wss://api-ca-central-1.amigo.ai/v1/{org}/conversation/converse_realtime`   |
| **EU Central**   | `wss://api-eu-central-1.amigo.ai/v1/{org}/conversation/converse_realtime`   |
| **AP Southeast** | `wss://api-ap-southeast-2.amigo.ai/v1/{org}/conversation/converse_realtime` |

### Query Parameters

| Parameter                   | Type              | Required | Description                                                                                                         | Example     |
| --------------------------- | ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------- | ----------- |
| `response_format`           | `text` \| `voice` | Required | Agent response format                                                                                               | `voice`     |
| `audio_format`              | `mp3` \| `pcm`    | If voice | <p>Audio encoding:<br>• <code>pcm</code>: lower latency, VAD support<br>• <code>mp3</code>: bandwidth-efficient</p> | `pcm`       |
| `current_agent_action_type` | regex             | Optional | Filter agent action events                                                                                          | `^tool\..*` |

## Authentication

WebSocket authentication uses the `Sec-WebSocket-Protocol` header with your bearer token:

```javascript
// Get your auth token (from login or API key)
const authToken = await getAuthToken();

// Pass token as WebSocket subprotocol
const ws = new WebSocket(
  'wss://api.amigo.ai/v1/your-org/conversation/converse_realtime?response_format=voice&audio_format=pcm',
  ['bearer.authorization.amigo.ai.' + authToken]  // ← Token goes here
);
```

{% hint style="info" %}
**Token Format**\
The token format is `bearer.authorization.amigo.ai.` + your JWT token. This is passed as a WebSocket subprotocol, not a header.
{% endhint %}

## Conversation Flow

### Step 1: Connect & Authenticate

```javascript
const ws = new WebSocket(
  `wss://api.amigo.ai/v1/${orgId}/conversation/converse_realtime?response_format=voice&audio_format=pcm`,
  [`bearer.authorization.amigo.ai.${authToken}`]
);

// Handle connection events
ws.onopen = () => console.log('Connected');
ws.onerror = (error) => console.error('Connection error:', error);
ws.onclose = (event) => console.log('Disconnected:', event.code, event.reason);
```

### Step 2: Initialize Conversation

Once connected, you must initialize the conversation:

#### Option A: Start New Conversation

```javascript
ws.send(JSON.stringify({
  type: 'client.start-conversation',
  service_id: 'your-service-id',        // Your agent service ID
  service_version_set_name: 'release'   // 'release', 'edge', or custom
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-created', conversation_id: '...' }
```

#### Option B: Continue Existing Conversation

```javascript
ws.send(JSON.stringify({
  type: 'client.continue-conversation',
  conversation_id: 'existing-conversation-id'
}));

// Wait for confirmation
// → Receive: { type: 'server.conversation-retrieved' }
```

### Step 3: Exchange Messages

Now you can send text or audio messages and receive responses:

### Sequence Diagram

```mermaid
%%{init: {"theme": "base", "themeVariables": {"actorBkg": "#083241", "actorTextColor": "#FFFFFF", "actorBorder": "#083241", "signalColor": "#575452", "signalTextColor": "#100F0F", "labelBoxBkgColor": "#F1EAE7", "labelBoxBorderColor": "#D7D2D0", "labelTextColor": "#100F0F", "loopTextColor": "#100F0F", "noteBkgColor": "#F1EAE7", "noteBorderColor": "#D7D2D0", "noteTextColor": "#100F0F", "activationBkgColor": "#E8E2EB", "activationBorderColor": "#083241", "altSectionBkgColor": "#F1EAE7", "altSectionColor": "#100F0F"}}}%%
sequenceDiagram
    autonumber
    participant C as Client (Browser/App)
    participant S as Server (WebSocket Endpoint)

    Note over C,S: WebSocket connection established (wss://...).

    C->>S: StartConversation request
    S-->>C: Ack { request:"StartConversation" }

    C->>S: TriggerFirstMessage request
    loop Agent TTS audio stream
        S-->>C: AgentAudio { pcm, timestamps }
      end
      S-->>C: InteractionComplete { interaction_id, message, ... }

    C->>S: SwitchVADMode { vad_mode_on: true }
    S-->>C: Ack { request:"SetVADMode" }

    rect rgba(0,0,0,0.03)
      loop Continuous upstream audio
        C-->>S: AudioChunk { pcm16 }
      end
    end

    Note over S: Server-side VAD monitors incoming audio

    S-->>C: Event:SpeechStarted { ts_start }
    Note over C: Pause any local/agent audio<br/>playback immediately

    %% (Optional) If client keeps sending audio, server may continue consuming; VAD tracks speech state.

    S-->>C: Event:SpeechEnded { ts_end }

    par Server generates agent response
      Note over S: ASR/LLM/TTS pipeline runs
    and Downlink agent audio
      loop Agent TTS audio stream
        S-->>C: AgentAudio { pcm, timestamps }
      end
      S-->>C: InteractionComplete { interaction_id, message, ... }
    end

    %% External event interruption in VAD mode
    Note over C,S: External event interruption (VAD mode)
    C->>S: client.new-text-message<br/>{ message_type: 'external-event',<br/>start_interaction: true }
    alt Agent hasn't detected user speaking
        Note over S: Interrupts existing<br/>interaction immediately
        S-->>C: server.new-message (response to event)
        S-->>C: server.interaction-complete
    else User is speaking (agent has detected)
        Note over S: Waits until user<br/>finishes speaking
        S-->>C: server.vad-speech-ended
        S-->>C: server.new-message (response to event)
        S-->>C: server.interaction-complete
    end
```

## Message Reference

### Messages You Send (Client → Server)

#### Send Text

```javascript
// User message
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Hello, how can you help me?',
  message_type: 'user-message'
}));

// System event (e.g., user actions, context)
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'User navigated to checkout page',
  message_type: 'external-event',
  start_interaction: false  // Optional: default is false
}));

// Urgent external event that starts new interaction (VAD mode)
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  text: 'Critical system alert',
  message_type: 'external-event',
  start_interaction: true  // Interrupts current interaction in VAD mode
}));
```

#### Send Audio

```javascript
// First chunk - include config
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: base64AudioChunk,  // Base64 encoded PCM audio
  audio_config: {
    format: 'pcm',
    sample_rate: 16000,     // 16kHz
    sample_width: 2,        // 16-bit
    n_channels: 1,          // Mono
    frame_rate: 16000
  }
}));

// Subsequent chunks - no config needed
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: nextBase64AudioChunk
}));

// Signal end of audio
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: null
}));
```

#### Voice Activity Detection (VAD)

```javascript
// Enable automatic speech detection
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// In VAD mode: continuously stream audio, server detects speech
// Server will automatically determine when user starts/stops speaking
```

#### Finish Conversation

**Standard Mode**

```json
{
  "type": "client.finish-conversation"
}
```

**VAD Mode**

When in VAD mode, first disable VAD, wait for acknowledgment, then finish:

```json
// Step 1: Disable VAD mode
{
  "type": "client.switch-vad-mode",
  "vad_mode_on": false
}

// Wait for response (may take up to 10 seconds)
{
  "type": "server.vad-mode-switched",
  "current_vad_mode_on": false
}

// Step 2: Finish conversation
{
  "type": "client.finish-conversation"
}

// Response
{
  "type": "server.conversation-completed"
}
```

#### Graceful Close

```json
{
  "type": "client.close-connection"
}
```

#### Extend Timeout

```json
{
  "type": "client.extend-timeout"
}
```

### Messages You Receive (Server → Client)

#### Conversation Lifecycle

```javascript
// Conversation created
{
  type: 'server.conversation-created',
  conversation_id: '507f1f77bcf86cd799439012'
}

// Conversation retrieved (when continuing)
{
  type: 'server.conversation-retrieved'
}

// Conversation finished
{
  type: 'server.conversation-completed'
}
```

#### Agent Responses

````javascript
// Text response chunk
{
  type: 'server.new-message',
  interaction_id: '...',
  message: 'Hello! I can help you with...',  // Text chunk
  message_metadata: [],
  transcript_alignment: null,
  stop: false,              // false = more chunks coming
  sequence_number: 1,
  message_id: '...'
}

// Audio response chunk
{
  type: 'server.new-message',
  interaction_id: '...',
  message: 'base64_audio_chunk',             // Base64 PCM audio
  message_metadata: [],
  transcript_alignment: [                    // Timing for each character (ms)
    [0, 'H'], [100, 'e'], [200, 'l'], [300, 'l'], [400, 'o']
  ],
  stop: false,
  sequence_number: 1,
  message_id: '...'
}

// Response complete
{
  type: 'server.interaction-complete',
  message_id: '...',
  interaction_id: '...',
  full_message: 'Complete text or transcript',
  conversation_completed: false    // true = agent ended conversation
}

#### Voice Activity Detection Events
```javascript
// User started speaking
{
  type: 'server.vad-speech-started',
  start: 1.234  // Seconds from last reset
}

// User stopped speaking (with transcript)
{
  type: 'server.vad-speech-ended',
  transcript: 'What the user said',
  start: 1.234,  // When speech started (seconds)
  end: 3.456     // When speech ended (seconds)
}

// Time reference reset
{
  type: 'server.vad-speech-reset-zero',
  timestamp: 0.0  // New zero point for timing
}

// VAD mode changed
{
  type: 'server.vad-mode-switched',
  current_vad_mode_on: true  // Current VAD state
}
````

## Voice Activity Detection (VAD) Mode

VAD mode enables hands-free, natural conversations with automatic speech detection.

{% hint style="warning" %}
**VAD Requirements**

* Audio format must be PCM (MP3 is not supported).
* Continuous audio streaming is required.
* Interruptions are handled automatically by pausing agent audio.
  {% endhint %}

### How VAD Works

```javascript
// 1. Enable VAD mode
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// 2. Stream audio continuously (server detects speech)
const streamAudio = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  // ... convert to PCM and send chunks continuously
};

// 3. Handle VAD events
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch(msg.type) {
    case 'server.vad-speech-started':
      console.log('User speaking...');
      pauseAgentAudio();  // Stop agent playback if speaking
      break;

    case 'server.vad-speech-ended':
      console.log('User said:', msg.transcript);
      // Agent automatically responds
      break;
  }
};
```

### VAD with External Events

External events can interrupt ongoing conversations in VAD mode when marked with `start_interaction: true`:

```javascript
// Enable VAD mode first
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// Send external event that can interrupt
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({
    event: 'payment.failed',
    amount: 100.00,
    error: 'Card declined'
  }),
  start_interaction: true  // Triggers interruption behavior
}));

// Handle the response
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch(msg.type) {
    case 'server.new-message':
      // This will be the agent's response to the external event
      console.log('Agent response to event:', msg.message);
      break;

    case 'server.interaction-complete':
      console.log('External event interaction completed');
      break;
  }
};
```

**External Event Behavior in VAD Mode:**

* **When the agent hasn't detected the user speaking**: an external event with `start_interaction: true` interrupts any existing interaction and starts a new interaction immediately with the external event.
* **When the user is speaking (agent has detected)**: the external event is queued. The agent waits until the user finishes speaking (indicated by `server.vad-speech-ended`), then triggers a new interaction with the external event.

### VAD Requirements

**Important**:

* **Audio format**: must use PCM (MP3 not supported in VAD mode).
* **Streaming**: continuous audio streaming required.
* **Interruptions**: handled automatically by pausing agent audio.

## Audio Configuration

### PCM Format (Best for real-time & VAD)

```javascript
const pcmConfig = {
  format: 'pcm',
  sample_rate: 16000,   // 16 kHz
  sample_width: 2,      // 16-bit
  n_channels: 1,        // Mono
  frame_rate: 16000
};

// Example: Convert Web Audio API to PCM
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

processor.onaudioprocess = (e) => {
  const pcmData = convertToPCM16(e.inputBuffer.getChannelData(0));
  ws.send(JSON.stringify({
    type: 'client.new-audio-message',
    audio: btoa(String.fromCharCode(...pcmData))
  }));
};
```

### MP3 Format (Bandwidth-efficient)

```javascript
const mp3Config = {
  format: 'mp3',
  bit_rate: 128000,    // 128 kbps
  sample_rate: 44100,  // 44.1 kHz  
  n_channels: 2        // Stereo
};
```

## Language Support

### Supported Languages

The following languages are supported for both voice transcription and synthesis:

| Language   | Code |
| ---------- | ---- |
| English    | `en` |
| Spanish    | `es` |
| French     | `fr` |
| German     | `de` |
| Italian    | `it` |
| Portuguese | `pt` |
| Polish     | `pl` |
| Turkish    | `tr` |
| Russian    | `ru` |
| Dutch      | `nl` |
| Czech      | `cs` |
| Arabic     | `ar` |
| Chinese    | `zh` |
| Japanese   | `ja` |
| Hungarian  | `hu` |
| Korean     | `ko` |
| Hindi      | `hi` |

> Language is determined by: 1) User's `preferred_language` setting, 2) Agent's `default_spoken_language` fallback

## Error Handling

### WebSocket Close Codes

| Code     | Error             | Common Cause                       | Solution                        |
| -------- | ----------------- | ---------------------------------- | ------------------------------- |
| **3000** | Unauthorized      | Invalid/expired token              | Refresh auth token              |
| **3003** | Forbidden         | Missing permissions                | Check user permissions          |
| **3008** | Timeout           | No activity for 30s                | Send `extend-timeout` every 15s |
| **4000** | Bad Request       | Invalid message format             | Check message structure         |
| **4004** | Not Found         | Service/conversation doesn't exist | Verify IDs                      |
| **4009** | Conflict          | Conversation locked/finished       | Check conversation state        |
| **4015** | Unsupported Media | Wrong audio format                 | Use PCM for VAD, check config   |
| **4029** | Rate Limited      | Too many messages                  | Implement backoff, max 1200/min |

### Error Handling Example

```javascript
ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  switch(event.code) {
    case 3000:
      // Refresh token and reconnect
      await refreshAuthToken();
      reconnect();
      break;
    case 3008:
      // Connection timed out
      console.log('Connection timed out - forgot to send keep-alive?');
      break;
    case 4029:
      // Rate limited - implement exponential backoff
      setTimeout(() => reconnect(), backoffDelay);
      break;
    default:
      console.error(`Connection closed: ${event.code} - ${event.reason}`);
  }
};
```

## Performance & Limits

### Rate Limits

| Limit                      | Value              | Notes                           |
| -------------------------- | ------------------ | ------------------------------- |
| **Messages/minute**        | 1200               | Includes all message types      |
| **Connection timeout**     | 30 seconds         | Reset by any message            |
| **Keep-alive interval**    | 15 seconds         | Send `extend-timeout`           |
| **Concurrent connections** | 1 per user/service | One active connection at a time |
| **Audio chunk size**       | 20-60ms            | Optimal for real-time streaming |
| **Max message size**       | 1MB                | For audio chunks                |

### Keep Connection Alive

```javascript
// Send keep-alive every 15 seconds
const keepAlive = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
  }
}, 15000);

// Clean up on close
ws.onclose = () => clearInterval(keepAlive);
```

## Complete Implementation

### Production-Ready WebSocket Client

```javascript
class RealtimeConversation {
  constructor(orgId, authToken, options = {}) {
    this.orgId = orgId;
    this.authToken = authToken;
    this.ws = null;
    this.keepAliveInterval = null;
    this.audioQueue = [];
    this.isPlaying = false;
    
    // Configuration
    this.options = {
      responseFormat: options.responseFormat || 'voice',
      audioFormat: options.audioFormat || 'pcm',
      vadEnabled: options.vadEnabled || false,
      onMessage: options.onMessage || (() => {}),
      onError: options.onError || console.error,
      onClose: options.onClose || (() => {})
    };
  }

  async connect(serviceId) {
    const url = `wss://api.amigo.ai/v1/${this.orgId}/conversation/converse_realtime` +
                `?response_format=${this.options.responseFormat}` +
                `&audio_format=${this.options.audioFormat}`;
    
    this.ws = new WebSocket(url, [`bearer.authorization.amigo.ai.${this.authToken}`]);
    
    return new Promise((resolve, reject) => {
      this.ws.onopen = () => {
        console.log('WebSocket connected');
        
        // Start keep-alive
        this.startKeepAlive();
        
        // Initialize conversation
        this.ws.send(JSON.stringify({
          type: 'client.start-conversation',
          service_id: serviceId,
          service_version_set_name: 'release'
        }));
      };
      
      this.ws.onmessage = (event) => {
        const message = JSON.parse(event.data);
        this.handleMessage(message);
        
        if (message.type === 'server.conversation-created') {
          resolve(message.conversation_id);
          
          // Enable VAD if requested
          if (this.options.vadEnabled) {
            this.enableVAD();
          }
        }
      };
      
      this.ws.onerror = (error) => {
        this.options.onError(error);
        reject(error);
      };
      
      this.ws.onclose = (event) => {
        this.cleanup();
        this.options.onClose(event);
      };
    });
  }
  
  handleMessage(message) {
    this.options.onMessage(message);
    
    switch(message.type) {
      case 'server.conversation-created':
        console.log('Conversation started:', message.conversation_id);
        break;
        
      case 'server.new-message':
        if (this.options.responseFormat === 'voice' && message.message) {
          this.queueAudio(message.message);
        }
        break;
        
      case 'server.interaction-complete':
        console.log('Response complete');
        break;
        
      case 'server.vad-speech-started':
        console.log('User speaking...');
        this.pauseAudio();
        break;
        
      case 'server.vad-speech-ended':
        console.log('User said:', message.transcript);
        break;
    }
  }
  
  // Keep connection alive
  startKeepAlive() {
    this.keepAliveInterval = setInterval(() => {
      if (this.ws?.readyState === WebSocket.OPEN) {
        this.ws.send(JSON.stringify({ type: 'client.extend-timeout' }));
      }
    }, 15000);
  }
  
  // Voice Activity Detection
  async enableVAD() {
    this.ws.send(JSON.stringify({
      type: 'client.switch-vad-mode',
      vad_mode_on: true
    }));
    
    // Start streaming microphone audio
    await this.startAudioStream();
  }
  
  async startAudioStream() {
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        channelCount: 1,
        sampleRate: 16000,
        echoCancellation: true,
        noiseSuppression: true
      }
    });
    
    const audioContext = new AudioContext({ sampleRate: 16000 });
    const source = audioContext.createMediaStreamSource(stream);
    const processor = audioContext.createScriptProcessor(4096, 1, 1);
    
    let isFirstChunk = true;
    processor.onaudioprocess = (e) => {
      const pcmData = this.convertToPCM16(e.inputBuffer.getChannelData(0));
      this.sendAudio(pcmData, isFirstChunk);
      isFirstChunk = false;
    };
    
    source.connect(processor);
    processor.connect(audioContext.destination);
  }
  
  // Send messages
  sendText(text, messageType = 'user-message') {
    if (this.ws?.readyState !== WebSocket.OPEN) {
      throw new Error('WebSocket not connected');
    }
    
    this.ws.send(JSON.stringify({
      type: 'client.new-text-message',
      text: text,
      message_type: messageType
    }));
  }
  
  sendAudio(audioData, isFirstChunk = false) {
    const message = {
      type: 'client.new-audio-message',
      audio: this.arrayBufferToBase64(audioData)
    };
    
    if (isFirstChunk) {
      message.audio_config = {
        format: 'pcm',
        sample_rate: 16000,
        sample_width: 2,
        n_channels: 1,
        frame_rate: 16000
      };
    }
    
    this.ws.send(JSON.stringify(message));
  }
  
  completeAudio() {
    this.ws.send(JSON.stringify({
      type: 'client.new-audio-message',
      audio: null
    }));
  }
  
  // Audio utilities
  convertToPCM16(float32Array) {
    const int16Array = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]));
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return int16Array.buffer;
  }
  
  arrayBufferToBase64(buffer) {
    const bytes = new Uint8Array(buffer);
    let binary = '';
    bytes.forEach(b => binary += String.fromCharCode(b));
    return btoa(binary);
  }
  
  base64ToArrayBuffer(base64) {
    const binaryString = atob(base64);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes.buffer;
  }
  
  // Audio playback
  queueAudio(base64Audio) {
    const audioBuffer = this.base64ToArrayBuffer(base64Audio);
    this.audioQueue.push(audioBuffer);
    
    if (!this.isPlaying) {
      this.playNextAudio();
    }
  }
  
  async playNextAudio() {
    if (this.audioQueue.length === 0) {
      this.isPlaying = false;
      return;
    }
    
    this.isPlaying = true;
    const audioBuffer = this.audioQueue.shift();
    
    // Play using Web Audio API
    const audioContext = new AudioContext();
    const source = audioContext.createBufferSource();
    
    // Decode PCM data
    const audioData = await audioContext.decodeAudioData(audioBuffer);
    source.buffer = audioData;
    source.connect(audioContext.destination);
    source.onended = () => this.playNextAudio();
    source.start();
  }
  
  pauseAudio() {
    // Clear audio queue when interrupted
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  // Cleanup
  async finish() {
    if (this.options.vadEnabled) {
      // Disable VAD first
      this.ws.send(JSON.stringify({
        type: 'client.switch-vad-mode',
        vad_mode_on: false
      }));
      
      // Wait for confirmation
      await new Promise(resolve => {
        const handler = (event) => {
          const msg = JSON.parse(event.data);
          if (msg.type === 'server.vad-mode-switched') {
            this.ws.removeEventListener('message', handler);
            resolve();
          }
        };
        this.ws.addEventListener('message', handler);
      });
    }
    
    // Now finish conversation
    this.ws.send(JSON.stringify({
      type: 'client.finish-conversation'
    }));
  }
  
  cleanup() {
    if (this.keepAliveInterval) {
      clearInterval(this.keepAliveInterval);
    }
    this.audioQueue = [];
    this.isPlaying = false;
  }
  
  close() {
    this.cleanup();
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'client.close-connection' }));
      this.ws.close();
    }
  }
}

// Usage Example
async function main() {
  const client = new RealtimeConversation('your-org', 'your-auth-token', {
    responseFormat: 'voice',
    audioFormat: 'pcm',
    vadEnabled: true,
    onMessage: (msg) => {
      // Handle all messages
      console.log('Message:', msg.type);
    },
    onError: (error) => {
      console.error('Error:', error);
    },
    onClose: (event) => {
      console.log('Closed:', event.code, event.reason);
    }
  });
  
  try {
    // Connect and start conversation
    const conversationId = await client.connect('service-id');
    console.log('Conversation ID:', conversationId);
    
    // Send a text message
    client.sendText('Hello, how can you help me?');
    
    // Or manually send audio (if not using VAD)
    // client.sendAudio(pcmAudioData, true);
    // client.completeAudio();
    
    // When done
    // await client.finish();
    
  } catch (error) {
    console.error('Failed to connect:', error);
  }
}

main();
```

## Common Patterns & Troubleshooting

### Connection Flow Diagram

```
1. Connect WebSocket
   ↓
2. Authenticate (via subprotocol)
   ↓
3. Initialize Conversation
   ├── New: client.start-conversation
   └── Existing: client.continue-conversation
   ↓
4. Receive Confirmation
   ├── server.conversation-created
   └── server.conversation-retrieved
   ↓
5. Exchange Messages
   ├── Text: client.new-text-message
   ├── Audio: client.new-audio-message
   └── VAD: Continuous streaming
   ↓
6. Receive Responses
   ├── server.new-message (chunks)
   └── server.interaction-complete
   ↓
7. Finish/Close
   ├── client.finish-conversation
   └── client.close-connection
```

### Common Issues and Solutions

| Issue                    | Symptom                   | Solution                                                    |
| ------------------------ | ------------------------- | ----------------------------------------------------------- |
| **No audio playback**    | Audio received but silent | Check audio format matches `audio_format` param             |
| **Connection drops**     | Disconnects after 30s     | Implement keep-alive with `extend-timeout`                  |
| **VAD not working**      | Speech not detected       | Make sure you are using PCM format, not MP3                 |
| **Authentication fails** | Code 3000 on connect      | Check token format: `bearer.authorization.amigo.ai.{token}` |
| **Conversation locked**  | Code 4009                 | Only one connection per user/service is allowed             |
| **Empty transcripts**    | VAD returns empty text    | Check microphone permissions and audio levels               |
| **Choppy audio**         | Broken playback           | Buffer audio chunks before playing                          |
| **High latency**         | Slow responses            | Use regional endpoints and PCM format                       |

## Best Practices

1. **Connection Management**
   * Implement reconnection logic for network interruptions.
   * Send periodic `extend-timeout` messages during long idle periods.
   * Close connections properly with `client.close-connection`.
2. **Audio Streaming**
   * Use PCM format for the lowest latency in VAD mode.
   * Stream audio chunks as they become available rather than buffering the whole message.
   * Include `audio_config` only in the first chunk.
3. **Error Recovery**
   * Handle WebSocket close events gracefully.
   * Implement exponential backoff for reconnections.
   * Save the conversation ID so you can continue after disconnection.
4. **Performance**
   * Reuse WebSocket connections when possible.
   * Process audio chunks immediately on receipt.
   * Use appropriate audio buffer sizes (typically 20-60ms chunks).
5. **Security**
   * Never expose authentication tokens in client-side code.
   * Use secure WebSocket connections (wss\://).
   * Refresh tokens before they expire.

## SDK & Framework Support

### Current Support

| Platform               | Status            | Notes                      |
| ---------------------- | ----------------- | -------------------------- |
| **JavaScript/Browser** | Full support      | Native WebSocket API       |
| **Node.js**            | Full support      | Use `ws` package           |
| **TypeScript SDK**     | Not yet available | Use WebSocket API directly |
| **Python**             | Supported         | Use `websockets` library   |
| **React Native**       | Supported         | Built-in WebSocket support |
| **Flutter**            | Supported         | Use `web_socket_channel`   |

### Framework Examples

#### Node.js

```javascript
const WebSocket = require('ws');
const ws = new WebSocket(url, {
  headers: {
    'Sec-WebSocket-Protocol': `bearer.authorization.amigo.ai.${token}`
  }
});
```

#### Python

```python
import websockets
import json

async def connect():
    url = f"wss://api.amigo.ai/v1/{org}/conversation/converse_realtime"
    headers = {
        "Sec-WebSocket-Protocol": f"bearer.authorization.amigo.ai.{token}"
    }
    async with websockets.connect(url, subprotocols=[headers["Sec-WebSocket-Protocol"]]) as ws:
        await ws.send(json.dumps({
            "type": "client.start-conversation",
            "service_id": service_id,
            "service_version_set_name": "release"
        }))
```

## Related Documentation

* [Authentication Guide](/developer-guide/getting-started/authentication.md): set up auth tokens
* [Voice Conversations (HTTP)](/developer-guide/classic-api/core-api/conversations/conversations-voice.md): alternative HTTP approach
* [Conversation Events](/developer-guide/classic-api/core-api/conversations/conversations-events.md): event streaming details
* [Regional Endpoints](/developer-guide/getting-started/regions-and-endpoints.md): optimize latency


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/developer-guide/classic-api/core-api/conversations/conversations-realtime.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.