Compound Emotions
Turn-gated compound emotions derived from acoustic, linguistic, behavioral, and contextual evidence using Plutchik's dyad algebra and temporal trajectory analysis.
Compound emotion detection uses cross-channel agreement to reduce false positives on telephony audio. Signals like Resignation and Disengagement require corroboration from the text/language channel before firing, and behavioral thresholds (silence duration, arousal floor) are tuned to avoid triggering on routine call patterns.
Why Basic Labels Fall Short
Standard emotion classifiers give you labels like "angry" or "sad." That sounds useful until you try to act on it. A caller who is angry because your tool just errored out needs a different response than a caller who has been angry since the start of the call. A caller who sounds sad but keeps making sarcastic remarks is not sad - they are hostile in a way that basic classification completely misses.
Single-label emotions are also unstable. Acoustic models flip between "angry" and "neutral" turn by turn because real human emotion is not a clean category. It is a blend. Someone can be frustrated and resigned at the same time. They can sound calm while saying something devastating. The raw labels create noise, not signal.
Compound emotions solve this by fusing multiple evidence streams into a single actionable description of what the caller is actually experiencing, updated every turn.
Turn-Gated Architecture
The compound emotion resolver fires once per caller turn, not on a fixed timer. This matters because emotions shift in response to what just happened in the conversation, not on a clock.
Each time the caller finishes speaking, the resolver:
Collects the latest acoustic emotion scores from the emotion engine
Pulls the transcript and any behavioral signals (barge-ins, response length, silence duration)
Checks the current context graph state and recent tool outcomes
Runs all five fusion layers in order
Emits a
TurnEmotionSnapshotwith the compound label, confidence, and evidence
The resolver maintains a 5-turn sliding window. This is enough to detect trends (escalation, recovery) without carrying stale signal from minutes ago. The window advances with each caller turn, dropping the oldest entry.
Five Fusion Layers
Each layer can override or refine the output of the previous one. They run in order, and later layers take priority when they fire.
1. Plutchik Dyad Algebra
Robert Plutchik's wheel of emotions defines compound emotions as co-activations of basic emotions. When two basic emotions fire above threshold simultaneously, the resolver maps them to a dyad:
Sadness + Anger
Bitterness
Fear + Anger
Aggressiveness
Joy + Fear
Guilt
Sadness + Fear
Despair
Joy + Sadness
Bittersweetness
Anger + Disgust
Contempt
Fear + Surprise
Alarm
Joy + Trust
Love
Sadness + Disgust
Remorse
Anger + Anticipation
Hostility
The algebra uses the top two emotion scores from the acoustic model. Both must exceed 0.3 (normalized) for a dyad to fire. If only one emotion is strong, the resolver falls through to the raw label.
Example: A caller discussing a denied insurance claim shows Sadness at 0.45 and Anger at 0.52. The resolver outputs "Bitterness" rather than flipping between "sad" and "angry" turn by turn.
2. Temporal Trajectory
The 5-turn sliding window reveals how the caller's emotional state is moving. Four trajectory patterns:
Escalating - Valence trending down or arousal trending up across the window. The caller is getting more agitated. This is the early warning signal that something in the conversation is going wrong.
Recovering - Valence trending up after a dip. The caller was upset but the agent's response is working. Important for the empathy system to know so it does not over-correct.
Resignation - Arousal dropping steadily while valence stays low. The caller has stopped fighting. This is worse than anger because it means they have given up on the interaction.
Ambivalence - Valence oscillating (alternating positive and negative deltas). The caller is conflicted. Common when discussing difficult medical decisions or scheduling tradeoffs.
Trajectory compounds override dyad compounds when the trend is strong (3+ turns in the same direction).
Example: A caller starts frustrated (turn 1-2), gets angrier when put on hold (turn 3-4), then goes quiet with flat affect (turn 5). The trajectory shifts from "Escalating" to "Resignation" - a much more useful signal than the raw "neutral" the acoustic model reports on turn 5.
3. Behavioral Amplifiers
Voice-level behaviors that the acoustic model does not capture but the pipeline observes directly:
Impatience - High barge-in rate (2+ barge-ins in the window). The caller keeps interrupting the agent, which means they feel the conversation is moving too slowly or repeating itself.
Withdrawal - Consistently short responses (under 3 words per turn for 3+ turns). The caller has stopped engaging meaningfully. Often co-occurs with Resignation from the trajectory layer.
Disengagement - Long silences before responding (3+ seconds for 2+ turns). The caller is mentally checking out. Different from Withdrawal in that they pause before speaking rather than giving terse answers.
Behavioral amplifiers modify the compound label rather than replacing it. "Bitterness" becomes "Bitter Impatience" when barge-in rate is high. "Resignation" becomes "Resigned Withdrawal" when response lengths drop.
Example: The acoustic model says "neutral" for the last 3 turns, but the caller has only said "okay," "sure," and "fine." The behavioral layer flags Withdrawal. Combined with low valence from the trajectory, the output is "Resigned Withdrawal" - the caller is done with this conversation.
4. Contextual Modulation
What is happening in the conversation changes how emotions should be interpreted:
Process Frustration - Anger or frustration co-occurring with a tool failure, long hold, or repeated tool calls. The caller is not angry at the agent as a person. They are frustrated with the system. This distinction matters because empathetic language about the process ("I know this is taking longer than it should") works better than emotional acknowledgment ("I hear that you're upset").
Helplessness - Fear or sadness when the context graph is in a state where the agent has limited actions available, or when the agent has said something like "I'm not able to do that." The caller feels stuck.
Relief - Joy or positive valence immediately after a tool success or problem resolution. Confirms that the issue was actually resolved from the caller's perspective, not just the system's.
The contextual layer reads from the engine session: current state, recent tool call results, time spent in current state.
Example: A caller's anger spikes right after check_insurance_status returns an error. Without context, this looks like generic anger. With context, the resolver outputs "Process Frustration" - the caller is reacting to the tool failure, not the conversation itself.
5. Linguistic Override
When what the caller says contradicts how they sound, the words win. This layer runs basic pattern matching on the transcript:
Cold Hostility - Calm, measured acoustic signal but hostile or threatening language. The caller is not yelling, which makes the acoustic model report "neutral," but they are saying things like "I want to speak to your supervisor" or "this is unacceptable."
Masked Distress - Positive or neutral acoustic signal but distress language ("I don't know what to do," "I'm at my wit's end," "nobody will help me"). Common in healthcare contexts where callers try to hold it together.
Sarcasm - Positive acoustic markers (laughter, upward inflection) paired with negative semantic content ("Oh great, another transfer" or "Sure, that's really helpful"). The acoustic model sees joy, but the caller is expressing frustration.
Linguistic overrides have the highest priority. When they fire, they replace the compound label entirely.
Example: A caller laughs and says "Oh wonderful, so I just need to call back a fourth time." The acoustic model reports Joy with high confidence. The linguistic layer catches the sarcasm pattern and outputs "Sarcasm" instead. This prevents the empathy system from treating the caller as happy.
Where Compounds Are Used
Compound emotions flow through three channels:
Observer events - Every emotion_classified observer event includes the compound label alongside the raw acoustic scores. The developer console event log displays compounds in the emotion category, giving operators and developers visibility into what the resolver is detecting in real time.
Call intelligence - At call end, the compound emotion timeline is persisted as part of the call's analytics record. The emotion summary includes the dominant compound across the call, the highest-severity compound detected, and any trajectory shifts. This powers the emotion trends analytics endpoint.
Developer console - The call detail page shows compound emotions in the emotion tab. The timeline visualization uses compound labels rather than raw acoustic labels, making it much easier to understand what actually happened during a call. The playground session panel also shows live compounds during test calls.
TurnEmotionSnapshot
Each resolver invocation produces a TurnEmotionSnapshot with the following fields:
turn_index
int
Which caller turn triggered this snapshot
compound_label
str
The resolved compound emotion (e.g., "Bitter Impatience", "Process Frustration")
confidence
float
0.0 to 1.0, based on evidence strength across layers
primary_emotion
str
Strongest raw acoustic emotion
secondary_emotion
str or None
Second acoustic emotion if above threshold
valence
float
-1.0 to 1.0 from the acoustic model
arousal
float
0.0 to 1.0 from the acoustic model
trajectory
str or None
One of: Escalating, Recovering, Resignation, Ambivalence
behavioral_flags
list[str]
Active behavioral amplifiers (Impatience, Withdrawal, Disengagement)
context_modifier
str or None
Active contextual modulation (Process Frustration, Helplessness, Relief)
linguistic_override
str or None
Active linguistic override (Cold Hostility, Masked Distress, Sarcasm)
evidence
dict
Raw scores, barge-in count, response lengths, tool outcomes that contributed
The confidence field reflects how many layers contributed to the compound. A dyad-only compound with no trajectory or behavioral confirmation scores around 0.4. A compound with corroborating evidence from 3+ layers scores above 0.8. This lets consumers decide how much to trust the label - the empathy system uses a 0.5 threshold before adjusting its tier based on compounds.
Last updated
Was this helpful?

