rocketDeployment Safety

Simulation-based pre-production testing with synthetic personas, version set promotion gates, and staged rollout controls.

Deployment safety ensures that changes to agent behavior are validated before they reach live patient calls. No configuration change - whether it is a new context graph, updated safety rules, modified voice settings, or a new action - should go to production without passing through structured testing and staged rollout.

Simulation-Based Testing

Before a new agent configuration handles real calls, it is tested against synthetic scenarios using simulated personas. Each simulation runs a complete conversation through the agent's pipeline, including context graph navigation, tool execution, world model queries, and response generation.

Personas

A persona defines a simulated caller with specific characteristics:

  • Demographics - Age, language, communication style

  • Scenario - What the caller is trying to accomplish (reschedule an appointment, ask about a prescription, report symptoms)

  • Behavior patterns - How the caller responds to the agent (cooperative, confused, frustrated, in a hurry)

  • Edge cases - Unusual requests, ambiguous phrasing, topic changes mid-conversation

Personas are designed to cover the range of real interactions your deployment handles. A scheduling deployment might have personas for straightforward rescheduling, insurance questions that come up mid-call, callers who cannot remember their date of birth, and callers who ask for medical advice that the agent should not provide.

Scenarios

A scenario defines the conversation flow and expected outcomes:

  • Setup - What patient data exists in the world model before the call starts

  • Conversation script - The sequence of caller utterances the persona will produce

  • Expected behaviors - What the agent should do at each stage (navigate to the correct state, call the right tool, escalate when appropriate)

  • Success criteria - Measurable outcomes that determine whether the simulation passed (correct appointment booked, escalation triggered at the right moment, no safety monitor violations)

Simulations can run in bulk. A typical pre-deployment validation might run hundreds of scenarios across dozens of personas, producing a pass/fail report with detailed logs for any failures.

Version Sets

Version sets control how agent configurations move from development to production. The full promotion workflow (personal branch, test, preview, release) is described in the Deployment Model. From a safety perspective, each promotion step requires passing quality gates:

  • All simulation scenarios pass

  • No new safety monitor violations compared to the current release

  • Key performance metrics (response time, escalation rate, task completion) meet or exceed thresholds

If a problem is discovered in production after promotion, the previous release version set is still available for immediate rollback. Rolling back is a configuration change, not a code deployment.

Validating Agent Behavior

Beyond pass/fail simulation results, deployment safety includes qualitative review of agent behavior:

  • Conversation quality - Do the agent's responses sound natural and appropriate? Are they too verbose, too terse, or off-tone?

  • Escalation judgment - Does the agent escalate at the right moments? Does it escalate too often (disrupting operations) or too rarely (missing situations that need human review)?

  • Edge case handling - When the conversation goes off-script, does the agent recover gracefully or get stuck?

  • Safety boundary compliance - Does the agent stay within its configured scope? When asked to do something outside its capabilities, does it decline appropriately?

These assessments are part of the staging review process. The simulation framework produces transcripts and recordings that reviewers can examine before approving a promotion to release.

Last updated

Was this helpful?