Voice
Voice Mode Comparison
Voice Notes (HTTP)
Request Essentials
Sequence Diagram: Voice Note Exchange
API Reference
Send a new user message to the conversation. The endpoint will perform analysis and generate an agent message in response.
A UserMessageAvailableEvent will be the first event in the response, which includes the user message if it's sent as text, or the transcribed message if it's sent as voice.
A series of CurrentAgentActionEvents will follow, which indicates steps in the agent's thinking process. Then the agent message is generated sequentially in pieces, with each piece
being sent as a NewMessageEvent in the response. After all the pieces are sent, an InteractionCompleteEvent is sent. Depending on the conversation_completed property in this event, the conversation will be awaiting
a new message from the user, or it might automatically end (for instance, because the user message indicates the user wants to end the session), while the conversation is marked as finished and the post-conversation
analysis asynchronously initiated. The connection will then terminate.
Any further action on the conversation is only allowed after the connection is terminated.
A 200 status code doesn't indicate the successful completion of this endpoint, because the status code is transmitted before the stream starts. At any point during the stream,
an ErrorEvent might be sent, which indicates that an error has occurred. The connection will be immediately closed after.
This endpoint can only be called on a conversation that has started but not finished.
Permissions
This endpoint requires the following permissions:
User:UpdateUserInfoon the user who started the conversation.Conversation:InteractWithConversationon the conversation.
This endpoint may be impacted by the following permissions:
CurrentAgentActionEvents are only emitted if the authenticated user has theConversation:GetInteractionInsightspermission.
The username should be set to {org_id}_{user_id}, and the password should be the Amigo issued JWT token that identifies the user.
Amigo issued JWT token that identifies an user. It's issued either after logging in through the frontend, or manually through the SignInWithAPIKey endpoint.
An optional organization identifier that indicates from which organization the token is issued. This is used in rare cases where the user to authenticate is making a request for resources in another organization.
The identifier of the conversation to send a message to.
^[a-f0-9]{24}$The format in which the user message is delivered to the server.
The format of the response that will be sent to the user.
A regex for filtering the type of the current agent action to return. By default, all are returned. If you don't want to receive any events, set this to a regex that matches nothing, for instance ^$.
^.*$Configuration for the user message audio. This is only required if request_format is set to voice.
The content type of the request body, which must be multipart/form-data followed by a boundary.
^multipart\/form-data; boundary=.+$The Mongo cluster name to perform this request in. This is usually not needed unless the organization does not exist yet in the Amigo organization infra config database.
[]Succeeded. The response will be a stream of events in JSON format separated by newlines. The server will transmit an event as soon as one is available, so the client should respond to the events as soon as one arrives, and keep listening until the server closes the connection.
This may occur for the following reasons:
- The user message is empty.
- The preferred language does not support voice transcription or response.
- The
response_audio_formatfield is not set when voice output is requested. - The timestamps for external event messages are not in the past.
- The timestamps for external event messages are inconsistent with the conversation.
- The agent does not have voice config specified.
Invalid authorization credentials.
Missing required permissions.
Specified organization or conversation is not found.
The request body stream timed out.
The specified conversation is already finished, or a related operation is in process.
The format of the supplied audio file is not supported.
Invalid request path parameter or request body failed validation.
The user has exceeded the rate limit of 15 requests per minute for this endpoint.
The service is going through temporary maintenance.
Minimal Client Handling (browser-friendly)
Tips
Managing Perceived Latency
How Audio Fillers Work
Example Flow
Common Filler Scenarios
Benefits
Handling in Code
Real-time Voice (WebSocket)
Last updated
Was this helpful?

