plugExternal Events & Multi-Stream (WebSocket)

Amigo WebSockets support multiple information streams on a single connection so you can build rich, interactive apps. In addition to user text or audio, you can send external events (e.g., device telemetry, UI actions, page/navigation changes) that the agent incorporates in its next response.

This page focuses on how to publish external events alongside user input, and how they are associated with agent interactions.

Why Multi‑Stream?

  • Device information streams: battery/network/orientation updates for mobile assistants

  • UI interaction streams: button clicks, navigation, selection changes

  • Context updates: screen state, cart changes, presence/location signals

  • Real‑time apps: send events while a user is speaking (VAD or manual audio streaming)

Streams on the Connection

On a single WebSocket you can interleave these client → server messages:

  • client.new-text-message with message_type: 'user-message' (user text)

  • client.new-audio-message (user audio streaming) and client.new-audio-message with audio: null (end of audio)

  • client.new-text-message with message_type: 'external-event' (external events)

  • Control messages: client.switch-vad-mode, client.extend-timeout, client.finish-conversation, client.close-connection

And you will receive these server → client streams:

  • server.new-message (text chunks or base64 audio chunks)

  • server.interaction-complete (marks the end of a turn)

  • server.current-agent-action (optional; controlled by query filter)

  • VAD events: server.vad-speech-started, server.vad-speech-ended, server.vad-speech-reset-zero, server.vad-mode-switched

Association & Ordering

  • External events are timestamped and attached to the next interaction.

  • Any external events sent before or during a user’s input (text or audio) become context for that interaction.

  • After an interaction completes, the external‑event buffer is cleared.

  • You can also start an interaction using only an external event (no user text/audio).

Sequence Diagram

spinner

VAD Mode Sequence

spinner

Minimal VAD Mode Example (messages sent)

Send External Events

Send external context as structured text alongside the conversation. We recommend JSON‑string payloads so your agent can parse event types and data.

Notes:

  • text is a string; using JSON.stringify(...) keeps a consistent, parseable structure.

  • The server records these as external-event messages with timestamps for the interaction.

Use With Audio Streaming

You can interleave external events while streaming audio. Events sent between the start and end of a user’s audio are attached to that same interaction.

VAD mode is also supported. When client.switch-vad-mode enables VAD, you continuously stream PCM audio; the server detects speech boundaries, and any external events sent during speech are still attached to that interaction.

Start With Only an External Event

Kick off an interaction without user text/audio by sending an external event as the first message:

Interrupting Interactions in VAD Mode

When in VAD mode, external events with start_interaction: true can interrupt ongoing conversations:

Important Notes:

  • Without start_interaction: true, external events are buffered and attached to the next natural interaction

  • With start_interaction: true in VAD mode:

    • Interrupts existing agent responses when user is not speaking

    • Respects user speech - waits until user finishes before triggering the new interaction

  • This allows critical events to take precedence while maintaining natural conversation flow

Receive Agent Actions (Optional)

For interactive UIs, you can subscribe to agent action telemetry and filter what you receive. Use the current_agent_action_type query parameter when connecting.

Limits & Reliability

  • Messages/minute: 60 across all message types

  • Connection inactivity timeout: 30s (send client.extend-timeout every ~15s when idle)

  • Max message size: 1 MB (e.g., for audio chunks)

  • One active connection per user/service

  • VAD requires PCM audio (MP3 not supported in VAD mode)

Best Practices

  • Structure external events as JSON strings with an event name and minimal payload

  • Debounce high‑frequency updates (e.g., device telemetry every 3–5s or on meaningful change)

  • Send events before or during the user’s turn so they’re applied to the next response

  • Use regional endpoints and PCM for lowest latency in voice flows

Troubleshooting

Common close codes:

Code
Meaning
Typical fix

3000

Unauthorized

Refresh credentials / token format (bearer.authorization.amigo.ai.{JWT})

3003

Forbidden

Check user permissions

3008

Timeout

Send client.extend-timeout periodically when idle

4000

Bad Request

Validate message structure and types

4004

Not Found

Verify organization/service/conversation IDs

4009

Conflict

Only 1 connection per user/service

4015

Unsupported Media

Use PCM for VAD, check audio config

4029

Rate Limited

Backoff and reduce message frequency

Last updated

Was this helpful?