External Events & Multi-Stream (WebSocket)

Amigo WebSockets support multiple information streams on a single connection so you can build rich, interactive apps. In addition to user text or audio, you can send external events (e.g., device telemetry, UI actions, page/navigation changes) that the agent incorporates in its next response.

This page focuses on how to publish external events alongside user input, and how they are associated with agent interactions.

Why Multi‑Stream?

  • Device information streams: battery/network/orientation updates for mobile assistants

  • UI interaction streams: button clicks, navigation, selection changes

  • Context updates: screen state, cart changes, presence/location signals

  • Real‑time apps: send events while a user is speaking (VAD or manual audio streaming)

Streams on the Connection

On a single WebSocket you can interleave these client → server messages:

  • client.new-text-message with message_type: 'user-message' (user text)

  • client.new-audio-message (user audio streaming) and client.new-audio-message with audio: null (end of audio)

  • client.new-text-message with message_type: 'external-event' (external events)

  • Control messages: client.switch-vad-mode, client.extend-timeout, client.finish-conversation, client.close-connection

And you will receive these server → client streams:

  • server.new-message (text chunks or base64 audio chunks)

  • server.interaction-complete (marks the end of a turn)

  • server.current-agent-action (optional; controlled by query filter)

  • VAD events: server.vad-speech-started, server.vad-speech-ended, server.vad-speech-reset-zero, server.vad-mode-switched

Association & Ordering

  • External events are timestamped and attached to the next interaction.

  • Any external events sent before or during a user’s input (text or audio) become context for that interaction.

  • After an interaction completes, the external‑event buffer is cleared.

  • You can also start an interaction using only an external event (no user text/audio).

Sequence Diagram

VAD Mode Sequence

Minimal VAD Mode Example (messages sent)

// 1) Enable VAD mode
ws.send(JSON.stringify({
  type: 'client.switch-vad-mode',
  vad_mode_on: true
}));

// 2) Continuously stream PCM audio chunks (16kHz mono)
function onPcmFrame(pcmBase64) {
  ws.send(JSON.stringify({
    type: 'client.new-audio-message',
    audio: pcmBase64
  }));
}

// 3) Send external events during speech; they attach to the same interaction
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({ event: 'ui.navigate', page: '/checkout' })
}));

ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({ event: 'cart.add', sku: 'SKU-123' })
}));

// 4) Receive server.new-message chunks and server.interaction-complete as usual
ws.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  if (msg.type === 'server.new-message') {
    // handle text/audio chunk
  } else if (msg.type === 'server.interaction-complete') {
    // end of turn; external events above are associated with this interaction
  }
};

Send External Events

Send external context as structured text alongside the conversation. We recommend JSON‑string payloads so your agent can parse event types and data.

// Example: send a device telemetry update
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({
    event: 'device.update',
    battery: 72,
    network: 'wifi',
    orientation: 'landscape'
  })
}));

// Example: send a UI action
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({
    event: 'ui.click',
    target: 'checkout_button',
    page: '/cart'
  })
}));

Notes:

  • text is a string; using JSON.stringify(...) keeps a consistent, parseable structure.

  • The server records these as external-event messages with timestamps for the interaction.

Use With Audio Streaming

You can interleave external events while streaming audio. Events sent between the start and end of a user’s audio are attached to that same interaction.

// 1) Start audio stream (first chunk includes config)
ws.send(JSON.stringify({
  type: 'client.new-audio-message',
  audio: firstBase64PcmChunk,
  audio_config: { format: 'pcm', sample_rate: 16000, sample_width: 2, n_channels: 1, frame_rate: 16000 }
}));

// 2) Interleave external events while the user is speaking
ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: JSON.stringify({ event: 'ui.navigate', page: '/checkout' })
}));

// 3) Continue sending audio chunks
ws.send(JSON.stringify({ type: 'client.new-audio-message', audio: nextBase64PcmChunk }));

// 4) Signal end of audio
ws.send(JSON.stringify({ type: 'client.new-audio-message', audio: null }));

VAD mode is also supported. When client.switch-vad-mode enables VAD, you continuously stream PCM audio; the server detects speech boundaries, and any external events sent during speech are still attached to that interaction.

Start With Only an External Event

Kick off an interaction without user text/audio by sending an external event as the first message:

ws.send(JSON.stringify({
  type: 'client.new-text-message',
  message_type: 'external-event',
  text: 'ALERT: Payment gateway timeout rate spiking above threshold'
}));

Receive Agent Actions (Optional)

For interactive UIs, you can subscribe to agent action telemetry and filter what you receive. Use the current_agent_action_type query parameter when connecting.

// Only receive tool-related actions
const filter = encodeURIComponent('^tool\\..*');
const url = `wss://api.amigo.ai/v1/${org}/conversation/converse_realtime?response_format=text&current_agent_action_type=${filter}`;
const ws = new WebSocket(url, [`bearer.authorization.amigo.ai.${token}`]);

ws.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  if (msg.type === 'server.current-agent-action') {
    // e.g., show progress, update UI state
    renderAgentAction(msg.current_agent_action);
  }
};

Limits & Reliability

  • Messages/minute: 60 across all message types

  • Connection inactivity timeout: 30s (send client.extend-timeout every ~15s when idle)

  • Max message size: 1 MB (e.g., for audio chunks)

  • One active connection per user/service

  • VAD requires PCM audio (MP3 not supported in VAD mode)

Best Practices

  • Structure external events as JSON strings with an event name and minimal payload

  • Debounce high‑frequency updates (e.g., device telemetry every 3–5s or on meaningful change)

  • Send events before or during the user’s turn so they’re applied to the next response

  • Use regional endpoints and PCM for lowest latency in voice flows

Troubleshooting

Common close codes:

Code
Meaning
Typical fix

3000

Unauthorized

Refresh credentials / token format (bearer.authorization.amigo.ai.{JWT})

3003

Forbidden

Check user permissions

3008

Timeout

Send client.extend-timeout periodically when idle

4000

Bad Request

Validate message structure and types

4004

Not Found

Verify organization/service/conversation IDs

4009

Conflict

Only 1 connection per user/service

4015

Unsupported Media

Use PCM for VAD, check audio config

4029

Rate Limited

Backoff and reduce message frequency

Last updated

Was this helpful?