flask-vialSimulations

Automated agent testing with personas, scenarios, unit tests, and configurable success criteria.

Amigo's simulation system is an evaluation and testing framework for validating agent behavior before deploying to production. It enables you to define simulated users (personas), test scenarios, and success criteria, then run automated conversations to measure how your agent performs.

How Simulations Work

The simulation system uses five building blocks that compose together:

spinner

Building Blocks

Component
Purpose

Simulated user profiles with a background, role, and preferred language. Versioned so you can iterate on persona definitions without breaking existing tests.

Conversation scripts that define the objective, instructions for the simulated user, and how the conversation starts. Also versioned.

Combine a persona, a scenario, a service (with version set), and success criteria (metrics with thresholds) into a single test case.

Group multiple unit tests together, each with a configurable run count, to form a test suite.

Execute a unit test set. The platform runs all unit tests, evaluates metrics, and produces downloadable artifacts with the results.

Typical Workflow

  1. Define personas that represent different user archetypes (e.g., "confused new user", "expert power user", "frustrated customer").

  2. Define scenarios that describe what the simulated user is trying to accomplish and how the conversation should start.

  3. Create unit tests that pair a persona with a scenario, target a specific service and version set, and set success criteria based on conversation metrics.

  4. Group unit tests into sets with run counts (e.g., run each test 5 times for statistical significance).

  5. Execute runs and review artifacts to see whether your agent meets the defined success criteria.

circle-info

Versioning

Personas and scenarios are versioned independently. When you update a persona's background or a scenario's instructions, you create a new version. Unit tests reference the persona and scenario by ID and always use the latest version at run time. This lets you iterate on test definitions without recreating unit tests.

circle-check

API Categories

Personas

Simulation Personas -- Create, list, search, update, delete, and version simulated user profiles.

Scenarios

Simulation Scenarios -- Create, list, search, update, delete, and version conversation test scenarios.

Unit Tests

Simulation Unit Tests -- Create, list, search, update, and delete individual test cases.

Unit Test Sets

Simulation Unit Test Sets -- Create, list, search, update, and delete grouped test suites.

Unit Test Set Runs

Simulation Unit Test Set Runs -- Execute test suites, monitor progress, cancel runs, and download result artifacts.

Last updated

Was this helpful?