# Text Sessions

## Playground Testing

The Developer Console provides a unified playground for testing agents across multiple interaction modes from a single interface. Available modes depend on the service's channel type:

* **Text Chat** - Turn-by-turn simulation that auto-starts when the service is ready. Useful for iterating on agent logic without external dependencies.
* **REST API** - Sends messages through the production REST conversations endpoint, creating a durable conversation and submitting each message as a turn. An inspector panel shows the full request and response for every API exchange, including timing, so you can verify exactly what your integrations will see.
* **WebSocket** - Real-time text streaming over a persistent WebSocket connection. Messages and agent responses appear as they are generated. The connection automatically reconnects (up to five attempts with exponential backoff) if interrupted.
* **Voice Call** - Browser-based test call via WebSocket audio (available for voice-enabled services).

All modes share the Context Graph visualization and event timeline panels, and each mode supports an optional identifier (Caller ID, Entity ID, or test profile phone number) that is persisted across sessions.

{% hint style="info" %}
Text conversations and voice calls are now accessible through a single unified conversations resource in the API. See the [Developer Guide](https://docs.amigo.ai/developer-guide/platform-api/conversations) for endpoint details.
{% endhint %}

Text sessions let patients interact with the agent over SMS, WhatsApp, or WebSocket. The agent runs the same context graphs, calls the same tools, and follows the same safety rules regardless of channel. What changes is how messages get in and out - not how the agent reasons.

Everything you configure for voice - persona, context graphs, dynamic behaviors, memory, clinical tools - works identically over text. No separate bot, no reduced feature set.

## Conversation Persistence

Every text conversation is stored as a single value with three parts:

* **Plan** - A natural-language summary of where the conversation stands, written by the agent during compression. Plans are plain language, not structured data, so they stay readable across platform versions without migration.
* **Turns** - Up to 200 verbatim messages between patient and agent, each carrying a role, text, and timestamp.
* **Cursor** - A channel-specific position marker for resuming exactly where the conversation left off.

A conversation can be active, frozen, or closed. These are not separate operations - they are states derived from the conversation value itself.

### Freezing and Resumption

When a conversation goes quiet, the platform freezes it. An LLM compresses the turn history into a plan - a paragraph that captures the participants, what has been promised, where things stand, what should happen next, and what questions are still open. The plan and recent verbatim turns are saved together.

When a new message arrives on a frozen conversation, the platform thaws it: loads the plan and last five turns, feeds them into the reasoning engine, and picks up where things left off. The agent honors every prior commitment. It does not greet the patient again - it simply responds to whatever they said. From the patient's side, the conversation never stopped.

<figure><img src="/files/qNSHvzbflAXHbBc7YzGa" alt="Text session pipeline: input channels (inbound SMS, inbound WhatsApp, WebSocket, gap scanner, outbound API) through signal queue and reasoning engine to text response, WebSocket response, and surface delivery"><figcaption></figcaption></figure>

## Channels

### SMS

Patients text the agent's phone number. The platform creates a session automatically, loads the patient's context from the world model, and the agent responds. Outbound SMS checks consent before sending and respects TCPA-compliant quiet hours.

### WhatsApp

Patients message the agent's WhatsApp number. The conversation lives in the same WhatsApp thread, and session continuity is keyed on the patient's phone number with E.164 normalization. WhatsApp also supports voice notes - see [Audio Input](#audio-input) below.

### WebSocket

A persistent bidirectional connection for web and mobile apps. A client opens a WebSocket, sends JSON messages, and receives agent responses with typing indicators while the agent composes. When the WebSocket disconnects, the conversation freezes. When the client reconnects with the same conversation ID, it thaws.

<figure><img src="/files/qbuJqlqKuB02IgLuqaKP" alt="WebSocket text-stream lifecycle: connect and auth, greeting, conversation turns with typing indicators and tool calls, freeze on disconnect, thaw on reconnect"><figcaption></figcaption></figure>

The WebSocket runs the same actor as SMS and WhatsApp. Client messages push signals to the actor's queue, the actor runs them through the reasoning engine, and responses stream back as JSON frames. The agent does not know which transport delivered the message.

The public WebSocket endpoint is workspace-scoped and authenticates via a `Sec-WebSocket-Protocol` subprotocol header so credentials never appear in URLs, browser history, or proxy logs. Resuming a session from a different device or transport is a matter of reconnecting with the same conversation ID.

### REST

A synchronous REST endpoint runs the same reasoning pipeline as the WebSocket and returns the agent's response in one HTTP call. Use REST when the calling system prefers a standard request-response model, when the network path between client and platform does not allow long-lived connections, or when the integration is server-to-server.

The same endpoint also supports token-by-token streaming via Server-Sent Events: pass `Accept: text/event-stream` and the server emits a typed event sequence (`token`, `tool_call_started`, `tool_call_completed`, `thinking`, `message`, `done`) as the turn unfolds. SSE streaming gives consumer-facing UIs the same incremental feel as the WebSocket without a persistent connection, while keeping the simpler request-scoped lifecycle and authentication of REST. Concurrency, lock semantics, and persistence are identical to the synchronous JSON path - if the client disconnects mid-stream, the partial response is still saved and the next turn is unblocked.

{% hint style="info" %}
See the [Conversations API reference](https://docs.amigo.ai/developer-guide/platform-api/conversations) for the full WebSocket wire protocol, REST endpoint reference, the SSE event schema, connection parameters, error codes, and code examples.
{% endhint %}

### Channel Timing

| Channel   | Session Key          | Idle Timeout | Max Duration |
| --------- | -------------------- | ------------ | ------------ |
| SMS       | Phone number pair    | 30 minutes   | 2 hours      |
| WhatsApp  | Phone number (E.164) | 1 hour       | 24 hours     |
| WebSocket | API-authenticated    | 5 minutes    | 1 hour       |

SMS and WhatsApp use phone-number-keyed sessions with residency windows tuned for asynchronous messaging. WebSocket uses API-authenticated sessions with shorter timeouts because web chat is interactive - if the patient walks away, the session should freeze quickly rather than holding resources.

## How Sessions Start

**Inbound SMS.** A patient texts the agent's number. A session is created automatically with full patient context from the world model.

**Inbound WhatsApp.** A patient messages the agent's WhatsApp number. The platform validates the message, resolves the phone number to a workspace and service, and creates or resumes a session.

**WebSocket connection.** A client opens a WebSocket with workspace credentials, a service ID, and optionally a conversation ID to resume a prior session.

**Gap scanner.** When the platform detects missing patient data - an incomplete intake form, a lapsed screening - it can start a text conversation to collect it. If the data fits a structured format, the agent delivers a [surface](/channels/surfaces.md) (web form) inline and the patient fills it out without leaving the conversation.

**Outbound API.** Your systems trigger text sessions directly - appointment reminders, care follow-ups, post-visit check-ins. See [Outbound Patterns](/channels/outbound.md).

## How It Works

The agent greets the patient, then runs a multi-turn conversation. Under the hood, the [reasoning engine](/agent/reasoning-engine.md) navigates context graph states and transitions exactly as it would on a phone call. Tools execute identically. Data writes to the world model at conversation confidence. The post-session review pipeline runs the same way.

Every external event - an inbound message, a delivery status update, a surface submission, a timeout - becomes a signal pushed to the session actor's queue. The actor processes signals one at a time through three steps:

1. **Cut** - Is this signal a turn boundary? Inbound messages and surface submissions are. Delivery status updates are not.
2. **Navigate** - The reasoning engine determines what to do: respond, call a tool, escalate, or complete.
3. **Engage** - The chosen action executes: send a text reply, wait for input, or end the session.

Processing signals sequentially eliminates race conditions. When a patient sends two messages in quick succession or a surface submission arrives while the agent is mid-response, everything is serialized and handled in order.

### Message Coalescing

On SMS and WhatsApp, patients often send multiple messages in quick succession - splitting a thought across two texts, or adding a correction right after hitting send. Without coalescing, each message would trigger a separate agent response, creating a stuttering back-and-forth.

The platform handles this by draining accumulated messages after each turn. When the actor finishes responding to one message, it checks whether additional messages arrived during processing. If they did, it joins them into a single turn before running the next reasoning cycle. A patient who sends "I need an appointment" and then "for next Tuesday" a few seconds later gets one coherent response instead of two. Up to 10 messages can be coalesced into a single turn.

Coalescing is enabled for SMS and WhatsApp, where input cannot be gated. WebSocket sessions do not coalesce because the client controls send timing. Non-message signals (surface submissions, delivery status updates) are never coalesced - they are processed individually in their original order.

**WebSocket.** Each message is processed as a separate turn. The agent sees prior turns as conversation context, so responses remain coherent even without coalescing.

**REST API.** Only one turn can be in flight per conversation at a time. If a second turn request arrives while the first is still processing, it returns 409 Conflict. Wait for the current turn to complete before sending the next one. If a client timeout fires before the server finishes, the turn still runs to completion - the state is saved and the conversation freezes normally. The next GET shows the completed turn.

**Cross-transport exclusion.** A conversation can only have one active session at a time, regardless of transport. If a WebSocket session is active for a conversation, a REST turn request for the same conversation returns 409. Read operations (listing, detail) are never blocked - only message sends contend. Different conversations can process turns simultaneously.

## Audio Input

Text sessions can accept audio recordings instead of typed text. Audio is transcribed server-side and fed through the same reasoning pipeline.

### WhatsApp Voice Notes

When a patient sends a voice note on WhatsApp, the platform transcribes it, runs the full reasoning engine pipeline, synthesizes the agent's reply as spoken audio, and returns it as a voice note in the same thread. Patients hear a natural spoken response without making a phone call.

Switching between text and voice notes in the same thread preserves full conversation context. Concurrent voice notes from the same patient are serialized to prevent race conditions. Phone numbers are E.164-normalized so international numbers work correctly regardless of how the messaging provider formats them.

### Web Audio

Web chat clients can also send audio recordings. Unlike WhatsApp voice notes, the agent responds with text rather than audio - audio input is a convenience for web chat, not a full voice channel.

Both audio modes are distinct from [phone calls](/channels/voice.md), which handle real-time bidirectional audio with barge-in, emotion detection, and filler speech.

## What Is Different from Voice

Text sessions share the full reasoning stack with [phone](/channels/voice.md). A few things work differently.

**No audio pipeline.** No real-time speech-to-text, text-to-speech, filler speech, or barge-in. Messages arrive as text (or transcribed audio) and leave as text. WhatsApp voice notes are the exception - they go through transcription and synthesis but still operate turn-by-turn.

**Text-only emotion detection.** Voice calls analyze both language and vocal tone. Text sessions use language analysis only. The agent still adapts its tone based on detected sentiment, but has fewer signals to work with.

**No operator join.** Voice calls support conference-style operator takeover. Text sessions do not - operators can monitor in real time but do not join as participants. Escalation routes to a separate channel.

**Asynchronous pacing.** Patients respond when they can. Timeout windows range from minutes to days depending on the use case. An appointment confirmation might expire after two hours. A care gap outreach might stay open for a week.

**Longer responses.** Reading is faster than listening. The agent can include more detail per message - medication instructions, prep steps, follow-up lists - without the awkwardness of reading a paragraph aloud.

**Session compression.** Long conversations automatically compress older history to stay within model context limits. Clinical facts and conversation state are preserved while token count shrinks.

## SMS Consent

All outbound SMS checks consent before sending. If a patient has opted out (by texting STOP or through another opt-out mechanism), outbound messages to that number are blocked. Consent is tracked per patient per workspace. Inbound messages from opted-out patients still create sessions, since the patient initiated contact.

## Quiet Hours

Outbound text sessions respect TCPA-compliant quiet hours. No messages go out during restricted windows. Quiet hours are configurable per workspace to match local regulations and patient expectations.

## Surface Delivery

When the gap scanner starts a text session to collect missing data, it can deliver a [surface](/channels/surfaces.md) inline. Instead of dropping a bare link, the agent explains what data is needed and why, answers questions, and then shares the form link within the conversation thread.

## When to Use Which Channel

| Scenario                                         | Recommended Channel  |
| ------------------------------------------------ | -------------------- |
| Appointment reminders and confirmations          | SMS or WhatsApp      |
| Data collection (insurance, intake forms)        | Text with surface    |
| Complex scheduling with multiple options         | Voice                |
| Sensitive clinical conversations                 | Voice                |
| Post-visit follow-up and care instructions       | SMS or WhatsApp      |
| Urgent outreach requiring immediate response     | Voice                |
| Asynchronous communication                       | SMS or WhatsApp      |
| Speaking without a live call                     | WhatsApp voice notes |
| Regions where WhatsApp is the dominant messenger | WhatsApp             |
| Patient portal or mobile app chat                | WebSocket            |
| Internal tools with agent interaction            | WebSocket            |
| Backend integrations                             | REST or WebSocket    |

Most deployments use multiple channels. The choice comes down to urgency, complexity, patient preference, and regional messaging norms.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/channels/text-sessions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
