# Compound Emotions

## Why Basic Labels Fall Short

Standard emotion classifiers give you labels like "angry" or "sad." That sounds useful until you try to act on it. A caller who is angry because your tool just errored out needs a different response than a caller who has been angry since the start of the call. A caller who sounds sad but keeps making sarcastic remarks is not sad - they are hostile in a way that basic classification completely misses.

Single-label emotions are also unstable. Acoustic models flip between "angry" and "neutral" turn by turn because real human emotion is not a clean category. It is a blend. Someone can be frustrated and resigned at the same time. They can sound calm while saying something devastating. The raw labels create noise, not signal.

Compound emotions solve this by fusing multiple evidence streams into a single actionable description of what the caller is actually experiencing, updated every turn.

## Turn-Gated Architecture

The compound emotion resolver fires once per caller turn, not on a fixed timer. This matters because emotions shift in response to what just happened in the conversation, not on a clock.

Each time the caller finishes speaking, the resolver:

1. Collects the latest acoustic emotion scores from the emotion engine
2. Pulls the transcript and any behavioral signals (barge-ins, response length, silence duration)
3. Checks the current context graph state and recent tool outcomes
4. Runs all five fusion layers in order
5. Emits a `TurnEmotionSnapshot` with the compound label, confidence, and evidence

The resolver maintains a 5-turn sliding window. This is enough to detect trends (escalation, recovery) without carrying stale signal from minutes ago. The window advances with each caller turn, dropping the oldest entry.

{% @mermaid/diagram content="flowchart TD
trigger\["Turn Boundary\n(caller finishes speaking)"] --> L1

```
L1["Layer 1: Plutchik Dyad Algebra\nCo-activation of basic emotions"] --> L2
L2["Layer 2: Temporal Trajectory\nTrend across 5-turn window"] --> L3
L3["Layer 3: Behavioral Amplifiers\nBarge-ins, silences, short responses"] --> L4
L4["Layer 4: Contextual Modulation\nTool failures, stuck state, resolution"] --> L5
L5["Layer 5: Linguistic Override\nToxicity vs. tone contradiction"] --> output

output["Compound Label + Confidence"]

style trigger fill:#f5f5f5,stroke:#999
style output fill:#f5f5f5,stroke:#999

L1 -. "each layer can override previous" .-> L3
L2 -. "each layer can override previous" .-> L4
L3 -. "each layer can override previous" .-> L5" %}
```

## Five Fusion Layers

Each layer can override or refine the output of the previous one. They run in order, and later layers take priority when they fire.

### 1. Plutchik Dyad Algebra

Robert Plutchik's wheel of emotions defines compound emotions as co-activations of basic emotions. When two basic emotions fire above threshold simultaneously, the resolver maps them to a dyad:

| Co-activation        | Compound        |
| -------------------- | --------------- |
| Sadness + Anger      | Bitterness      |
| Fear + Anger         | Aggressiveness  |
| Joy + Fear           | Guilt           |
| Sadness + Fear       | Despair         |
| Joy + Sadness        | Bittersweetness |
| Anger + Disgust      | Contempt        |
| Fear + Surprise      | Alarm           |
| Joy + Trust          | Love            |
| Sadness + Disgust    | Remorse         |
| Anger + Anticipation | Hostility       |

The algebra uses the top two emotion scores from the acoustic model. Both must exceed 0.3 (normalized) for a dyad to fire. If only one emotion is strong, the resolver falls through to the raw label.

**Example:** A caller discussing a denied insurance claim shows Sadness at 0.45 and Anger at 0.52. The resolver outputs "Bitterness" rather than flipping between "sad" and "angry" turn by turn.

### 2. Temporal Trajectory

The 5-turn sliding window reveals how the caller's emotional state is moving. Four trajectory patterns:

* **Escalating** - Valence trending down or arousal trending up across the window. The caller is getting more agitated. This is the early warning signal that something in the conversation is going wrong.
* **Recovering** - Valence trending up after a dip. The caller was upset but the agent's response is working. Important for the empathy system to know so it does not over-correct.
* **Resignation** - Arousal dropping steadily while valence stays low. The caller has stopped fighting. This is worse than anger because it means they have given up on the interaction.
* **Ambivalence** - Valence oscillating (alternating positive and negative deltas). The caller is conflicted. Common when discussing difficult medical decisions or scheduling tradeoffs.

Trajectory compounds override dyad compounds when the trend is strong (3+ turns in the same direction).

**Example:** A caller starts frustrated (turn 1-2), gets angrier when put on hold (turn 3-4), then goes quiet with flat affect (turn 5). The trajectory shifts from "Escalating" to "Resignation" - a much more useful signal than the raw "neutral" the acoustic model reports on turn 5.

### 3. Behavioral Amplifiers

Voice-level behaviors that the acoustic model does not capture but the pipeline observes directly:

* **Impatience** - High barge-in rate (2+ barge-ins in the window). The caller keeps interrupting the agent, which means they feel the conversation is moving too slowly or repeating itself.
* **Withdrawal** - Consistently short responses (under 3 words per turn for 3+ turns). The caller has stopped engaging meaningfully. Often co-occurs with Resignation from the trajectory layer.
* **Disengagement** - Long silences before responding (3+ seconds for 2+ turns). The caller is mentally checking out. Different from Withdrawal in that they pause before speaking rather than giving terse answers.

Behavioral amplifiers modify the compound label rather than replacing it. "Bitterness" becomes "Bitter Impatience" when barge-in rate is high. "Resignation" becomes "Resigned Withdrawal" when response lengths drop.

**Example:** The acoustic model says "neutral" for the last 3 turns, but the caller has only said "okay," "sure," and "fine." The behavioral layer flags Withdrawal. Combined with low valence from the trajectory, the output is "Resigned Withdrawal" - the caller is done with this conversation.

### 4. Contextual Modulation

What is happening in the conversation changes how emotions should be interpreted:

* **Process Frustration** - Anger or frustration co-occurring with a tool failure, long hold, or repeated tool calls. The caller is not angry at the agent as a person. They are frustrated with the system. This distinction matters because empathetic language about the process ("I know this is taking longer than it should") works better than emotional acknowledgment ("I hear that you're upset").
* **Helplessness** - Fear or sadness when the context graph is in a state where the agent has limited actions available, or when the agent has said something like "I'm not able to do that." The caller feels stuck.
* **Relief** - Joy or positive valence immediately after a tool success or problem resolution. Confirms that the issue was actually resolved from the caller's perspective, not just the system's.

The contextual layer reads from the engine session: current state, recent tool call results, time spent in current state.

**Example:** A caller's anger spikes right after `check_insurance_status` returns an error. Without context, this looks like generic anger. With context, the resolver outputs "Process Frustration" - the caller is reacting to the tool failure, not the conversation itself.

### 5. Linguistic Override

When what the caller says contradicts how they sound, the words win. This layer runs basic pattern matching on the transcript:

* **Cold Hostility** - Calm, measured acoustic signal but hostile or threatening language. The caller is not yelling, which makes the acoustic model report "neutral," but they are saying things like "I want to speak to your supervisor" or "this is unacceptable."
* **Masked Distress** - Positive or neutral acoustic signal but distress language ("I don't know what to do," "I'm at my wit's end," "nobody will help me"). Common in healthcare contexts where callers try to hold it together.
* **Sarcasm** - Positive acoustic markers (laughter, upward inflection) paired with negative semantic content ("Oh great, another transfer" or "Sure, that's really helpful"). The acoustic model sees joy, but the caller is expressing frustration.

Linguistic overrides have the highest priority. When they fire, they replace the compound label entirely.

**Example:** A caller laughs and says "Oh wonderful, so I just need to call back a fourth time." The acoustic model reports Joy with high confidence. The linguistic layer catches the sarcasm pattern and outputs "Sarcasm" instead. This prevents the empathy system from treating the caller as happy.

## Where Compounds Are Used

Compound emotions flow through three channels:

**Observer events** - Every `emotion_classified` observer event includes the compound label alongside the raw acoustic scores. The developer console event log displays compounds in the emotion category, giving operators and developers visibility into what the resolver is detecting in real time.

**Call intelligence** - At call end, the compound emotion timeline is persisted as part of the call's analytics record. The emotion summary includes the dominant compound across the call, the highest-severity compound detected, and any trajectory shifts. This powers the emotion trends analytics endpoint.

**Developer console** - The call detail page shows compound emotions in the emotion tab. The timeline visualization uses compound labels rather than raw acoustic labels, making it much easier to understand what actually happened during a call. The playground session panel also shows live compounds during test calls.

## TurnEmotionSnapshot

Each resolver invocation produces a `TurnEmotionSnapshot` with the following fields:

| Field                 | Type          | Description                                                                      |
| --------------------- | ------------- | -------------------------------------------------------------------------------- |
| `turn_index`          | `int`         | Which caller turn triggered this snapshot                                        |
| `compound_label`      | `str`         | The resolved compound emotion (e.g., "Bitter Impatience", "Process Frustration") |
| `confidence`          | `float`       | 0.0 to 1.0, based on evidence strength across layers                             |
| `primary_emotion`     | `str`         | Strongest raw acoustic emotion                                                   |
| `secondary_emotion`   | `str or None` | Second acoustic emotion if above threshold                                       |
| `valence`             | `float`       | -1.0 to 1.0 from the acoustic model                                              |
| `arousal`             | `float`       | 0.0 to 1.0 from the acoustic model                                               |
| `trajectory`          | `str or None` | One of: Escalating, Recovering, Resignation, Ambivalence                         |
| `behavioral_flags`    | `list[str]`   | Active behavioral amplifiers (Impatience, Withdrawal, Disengagement)             |
| `context_modifier`    | `str or None` | Active contextual modulation (Process Frustration, Helplessness, Relief)         |
| `linguistic_override` | `str or None` | Active linguistic override (Cold Hostility, Masked Distress, Sarcasm)            |
| `evidence`            | `dict`        | Raw scores, barge-in count, response lengths, tool outcomes that contributed     |

The `confidence` field reflects how many layers contributed to the compound. A dyad-only compound with no trajectory or behavioral confirmation scores around 0.4. A compound with corroborating evidence from 3+ layers scores above 0.8. This lets consumers decide how much to trust the label - the empathy system uses a 0.5 threshold before adjusting its tier based on compounds.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/channels/voice/compound-emotions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
