# Agent Forge CLI

Agent Forge is the CLI tool for managing agent configurations on the Amigo platform. It lets you create, update, version, and promote agent components programmatically rather than through the web interface.

## What Agent Forge Does

Agent Forge treats agent configurations as code. You sync configurations to local JSON files, make changes, and push them back to the platform. This gives you version control, reproducibility, and the ability to script deployment workflows.

Agent Forge manages the following entity types:

* **Agents**: Persona, background, directives, and communication style
* **Context graphs**: Problem structure, states, transitions, and safety boundaries
* **Dynamic behaviors**: Runtime behaviors with triggers and response logic
* **Metrics**: Evaluation criteria and scoring rubrics
* **Personas**: Synthetic user profiles for simulation testing
* **Scenarios**: Test situations for simulation testing
* **Unit test sets**: Groups of tests with success criteria

## Core Operations

### Sync to Local

Pull configurations from the platform to your local file system:

```bash
# Pull all active agents
forge sync-to-local --entity-type agent --active-only

# Pull context graphs with a specific tag
forge sync-to-local --entity-type context_graph --tag emergency

# Pull evaluation framework components
forge sync-to-local --entity-type metric --tag accuracy
forge sync-to-local --entity-type persona --tag emergency_patient
forge sync-to-local --entity-type scenario --tag complex_symptoms
```

### Sync to Remote

Push local changes back to the platform:

```bash
# Push all changes to staging
forge sync-to-remote --all --apply --env staging

# Push all changes to production
forge sync-to-remote --all --apply --env production
```

Before applying changes, Agent Forge shows exactly what will be modified so you can review before confirming.

### Environment Support

Agent Forge supports separate staging and production environments. Changes are deployed to staging first, validated through testing, and then promoted to production.

```
agent-forge/
  local/
    staging/
      entity_data/
        agent/
        context_graph/
        dynamic_behavior_set/
        metric/
        persona/
        scenario/
        unit_test_set/
    production/
      entity_data/
        (same structure)
```

## Typical Workflow

1. **Pull current configurations** from the platform to your local environment.
2. **Make changes** to the JSON configuration files.
3. **Push to staging** and run your test sets to validate.
4. **Review results** and iterate if tests fail.
5. **Promote to production** after validation passes.

This workflow supports both manual changes and automated optimization. Teams can use Agent Forge directly for planned configuration updates, or set up automated pipelines that use Agent Forge to deploy and test changes as part of a continuous improvement process.

## Analytics

The `forge analyze` command group provides SQL-based exploration of workspace data directly from the CLI, replacing the need for external analytics tools.

### Query Commands

| Command                  | Description                                                                                                      |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------- |
| `forge analyze query`    | Execute ad-hoc SQL SELECT queries (inline or from file). Results limited to 1,000 rows with a 30-second timeout. |
| `forge analyze describe` | Preview a query's output schema without executing it - useful for validating JOINs and checking column types.    |
| `forge analyze tables`   | List available tables in the workspace schema. Supports SQL LIKE patterns for filtering.                         |
| `forge analyze schema`   | Describe a table's columns: names, data types, and comments.                                                     |
| `forge analyze sample`   | Preview sample rows from a table (default 5, max 20).                                                            |
| `forge analyze detail`   | Rich table metadata: row count, size, partitioning, column nullability, data freshness.                          |
| `forge analyze profile`  | Profile a column's data distribution: cardinality, null rate, min/max values.                                    |
| `forge analyze catalog`  | Display the full data catalog reference offline without a database connection.                                   |

### Query Templates

Pre-built analytics query templates for common patterns like conversation volume, tool performance, and metric trends:

```bash
# List templates
forge analyze template list

# Run a template with parameters
forge analyze template run conversation-volume -P days=7
```

All commands support `--json` for structured output, enabling integration with scripts and CI/CD pipelines.

## Voice Simulation

The `forge platform sim` command group provides CLI access to VoiceSim for exploring voice configuration space:

| Command                       | Description                                                     |
| ----------------------------- | --------------------------------------------------------------- |
| `forge platform sim create`   | Create a new simulation run                                     |
| `forge platform sim list`     | List simulation runs for the workspace                          |
| `forge platform sim sample`   | Sample and evaluate N configuration points                      |
| `forge platform sim evaluate` | Evaluate a specific configuration point against a scenario      |
| `forge platform sim get`      | Get run status and best results                                 |
| `forge platform sim summary`  | Aggregated summary with best-per-scenario and penalty frequency |
| `forge platform sim points`   | List scored points (by score or chronologically)                |
| `forge platform sim complete` | Mark a run as finished                                          |

See [Voice Simulation](https://docs.amigo.ai/testing/testing/voice-simulation) for conceptual background.

## Conversation Quality Check

The `forge quality check` command scans workspace conversations for agent behavioral issues - stuck loops, degenerate output, repetition, and other quality problems. It queries production conversation data directly and runs pattern-based detectors to surface problematic interactions.

```bash
# Scan last 24 hours
forge quality check <workspace-name>

# Wider window with message snippets
forge quality check <workspace-name> --days 7 --verbose

# Structured output for scripting
forge quality check <workspace-name> --json
```

### Detectors

| Detector                   | What It Finds                                                               |
| -------------------------- | --------------------------------------------------------------------------- |
| **Character degeneration** | Repeated characters, low entropy output, stuttering patterns                |
| **Stuck agent loops**      | Agent repeats the same response while the caller changes topics             |
| **Repetitive patterns**    | High similarity across sliding message windows                              |
| **Word salad**             | Incoherent output patterns like or-chains and excessive word repetition     |
| **Phantom success**        | Agent claims a tool call succeeded when the tool actually returned an error |

Results include conversation IDs, timestamps, detector names, and severity. Use `--verbose` to see the actual message excerpts that triggered each finding.

See [Voice Simulation](https://docs.amigo.ai/testing/testing/voice-simulation) and [Drift Detection](https://docs.amigo.ai/testing/testing/drift-detection) for related quality monitoring capabilities.

## Tool Testing

The `forge platform tool-test` commands let you test context graph tools without making phone calls:

| Command                            | Description                                                     |
| ---------------------------------- | --------------------------------------------------------------- |
| `forge platform tool-test resolve` | List available tools for a service with input schemas           |
| `forge platform tool-test execute` | Execute a tool with custom parameters and optional dry run mode |

## CLI Updates

Agent Forge includes a built-in update mechanism:

```bash
# Check for and apply updates manually
forge update
```

The CLI also checks for updates automatically in the background (every 30 minutes). When updates are available, the CLI prompts before applying. Uncommitted local changes are stashed during the update and restored afterward.

## Metric Versioning

Metrics support version tracking. Each metric can have multiple versions, with the `latest_version` field tracking the current iteration. This enables teams to evolve evaluation criteria over time while maintaining a history of how metrics were defined at each point. Older metric configurations are automatically migrated to the versioned schema.

## Platform API Commands

Agent Forge provides full CLI coverage for the Platform API, enabling end-to-end agent building and workspace management without the web interface. The `forge platform` command group covers 73 endpoints across all platform resources.

### Setup

Generate an API key from the Developer Console (**Settings > API Keys**), then add to your environment file:

```bash
PLATFORM_API_URL=https://api.platform.amigo.ai
PLATFORM_WORKSPACE_ID=<workspace-uuid>
PLATFORM_API_KEY=<bearer-token>
```

### E2E Agent Building

Build a complete agent from the CLI in four steps:

1. **Create agent** and agent version with identity, background, and behaviors
2. **Create context graph** and version with states, transitions, and exit conditions
3. **Create service** linking the agent and context graph together
4. **Add skills** (optional) for LLM-backed micro-agent capabilities

```bash
forge platform agent create --name "My Agent" --env myorg
forge platform agent create-version <agent-uuid> --file agent-version.json --env myorg
forge platform context-graph create --name "My Context Graph" --env myorg
forge platform context-graph create-version <context-graph-uuid> --file context-graph-version.json --env myorg
forge platform service create --name "My Service" --agent-id <agent-uuid> --context-graph-id <context-graph-uuid> --env myorg
```

### Resource Management

Full CRUD operations for all platform resources:

| Resource Group   | Commands                                                                                      |
| ---------------- | --------------------------------------------------------------------------------------------- |
| **Core**         | `agent`, `context-graph`, `service`, `skill`, `integration`, `persona`                        |
| **Voice & Text** | `call`, `recording`, `operator`, `phone-number`, `session` (voice + text), `outbound-trigger` |
| **Data**         | `data-source`, `world`, `fhir`, `crm`, `unification-rule`, `pipeline`, `function`             |
| **Operations**   | `audit`, `compliance`, `safety`, `monitor-concept`, `review-queue`                            |
| **Settings**     | `workspace`, `settings`, `api-key`, `task`, `billing`, `network`                              |
| **Surfaces**     | `surface` (create, deliver, list, e2e)                                                        |
| **Analytics**    | `analytics`, `command-center`                                                                 |
| **Testing**      | `sim` (voice simulation), `coverage` (B\&B exploration), `tool-test`                          |

All commands support `--json` for structured output and `--env` for environment selection.

### Bulk Push

Push local entity configurations to the platform in a single operation:

```bash
forge platform push --all --env myorg --apply
```

Supports selective push by entity type (`-e agent`, `-e context-graph`, `-e service`).

## Platform Functions

The `forge platform function` command group manages platform functions - declarative SQL, Python, and AI functions that agents can call mid-conversation.

| Command                            | Description                                                        |
| ---------------------------------- | ------------------------------------------------------------------ |
| `forge platform function register` | Register a new platform function with its definition and metadata  |
| `forge platform function list`     | List all registered functions in the workspace                     |
| `forge platform function test`     | Execute a function with test parameters and inspect the result     |
| `forge platform function delete`   | Remove a function registration                                     |
| `forge platform function query`    | Run an open-scope SQL query against workspace data                 |
| `forge platform function catalog`  | Display the full function catalog with signatures and descriptions |
| `forge platform function sync`     | Sync function definitions between local files and the platform     |

See [Platform Functions](https://docs.amigo.ai/agent/platform-functions) for conceptual background.

## Simulation Coverage

The `forge platform coverage` command group manages branch-and-bound simulation coverage runs that systematically explore context graph state space.

| Command                            | Description                                                                       |
| ---------------------------------- | --------------------------------------------------------------------------------- |
| `forge platform coverage create`   | Create a new coverage run for a service                                           |
| `forge platform coverage session`  | Create a session within a coverage run                                            |
| `forge platform coverage step`     | Step a session forward with a simulated user message                              |
| `forge platform coverage fork`     | Fork a session into N children at a decision point, each with a different message |
| `forge platform coverage score`    | Score a session against configured metrics                                        |
| `forge platform coverage graph`    | Retrieve the coverage knowledge graph with topology overlay and ghost nodes       |
| `forge platform coverage complete` | Complete a run and clean up ephemeral database branches                           |

See [Simulation Coverage](https://docs.amigo.ai/testing/testing/simulations#simulation-coverage) for conceptual background.

## Surface E2E Testing

The `forge platform surface e2e` command tests the full surface lifecycle from the CLI - surface creation, spec retrieval, form rendering, and data submission - in a single command. It validates that branding settings, field rendering, and data flow are working correctly end-to-end.

```bash
forge platform surface e2e --entity-id <uuid> --env myorg
```

This is useful for verifying surface configuration changes (branding, field types, sections) before deploying to production. The command exercises the same API paths and rendering pipeline that patients use.

## Simulation Testing

The `forge simulation` command group provides coverage-optimized simulation testing against context graphs. It automatically steers simulated conversations toward unvisited states, behaviors, and tools to maximize test coverage.

### How It Works

Each simulation turn follows a scoring loop:

1. The platform generates recommended user responses (graph-unaware)
2. An LLM classifier predicts which state each response would transition to
3. A scorer ranks responses by expected coverage value using graph structure
4. The highest-scoring response is sent as the simulated user message
5. Coverage state is updated based on the agent's response

### Commands

| Command                     | Description                                                                                                                                                          |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `forge simulation run`      | Execute a simulation with configurable sessions, turn budgets, and coverage targets                                                                                  |
| `forge simulation plan`     | Generate a target spec from a natural-language objective (e.g., "test the cancellation flow end-to-end")                                                             |
| `forge simulation bridge`   | Generate scenario variations from a natural-language objective, run multi-turn conversations with LLM-driven personas, and track coverage using interaction insights |
| `forge simulation evaluate` | Compare metric scores across simulation runs, including before/after diff mode                                                                                       |
| `forge simulation cleanup`  | Delete ephemeral test users created by simulation runs                                                                                                               |

### Configuration

Simulations are highly configurable:

| Setting         | Default    | Description                                                                                              |
| --------------- | ---------- | -------------------------------------------------------------------------------------------------------- |
| **Sessions**    | 3          | Number of parallel conversations                                                                         |
| **Max turns**   | 20         | Maximum turns per session                                                                                |
| **Budget**      | 100        | Total turn budget across all sessions                                                                    |
| **Algorithm**   | `frontier` | Scoring algorithm: `frontier`, `heatmap`, or `random`                                                    |
| **Temperament** | `random`   | Simulated user personality: `cooperative`, `neutral`, `frustrated`, `confused`, `skeptical`, or `random` |

Target specs can be generated from natural-language objectives using `forge simulation plan`, which translates goals into structured coverage targets based on the context graph structure.

### Simulation Bridge

The `forge simulation bridge` command combines scenario generation with multi-turn conversation execution. You describe what you want to test in natural language, and the bridge generates diverse scenario variations, runs each as a full conversation with an LLM-driven persona, and collects interaction insights after every turn for coverage tracking.

```bash
# Generate and run 5 scenarios testing cancellation handling
forge simulation bridge --service "Scheduling" --objective "test cancellation edge cases" --scenarios 5 --env staging
```

Each scenario includes a persona background, temperament (cooperative, frustrated, confused, skeptical, or neutral), and instructions that guide the simulated caller's behavior throughout the conversation. The bridge tracks which context graph states, tools, and dynamic behaviors were exercised across all scenarios, giving you coverage visibility without manually designing each test case.

The bridge also pulls interaction insights after each agent turn - the same detailed reasoning audit available for production calls - so you can see which memories were active, what state transitions occurred, and which tools were considered at every step of every scenario.

#### Result Persistence and Reports

Simulation bridge results are persisted locally across runs, enabling trend analysis and regression detection. After a run completes, you can generate summary reports with pass/fail counts, score distributions, and failure breakdowns. Comparing current results against previous runs shows whether a configuration change improved or degraded coverage.

Tag simulation scenarios for selective execution - for example, `forge simulation bridge --tag scheduling` runs only scheduling-related scenarios. Tags let you build a reusable test library that grows over time as you discover edge cases worth preserving.

## Changelog Command

The `forge changelog` command provides cross-entity change traceability - tracking what changed across agents, context graphs, behaviors, and metrics over time. This gives teams visibility into configuration drift without relying on external version control tooling.

## When to Use Agent Forge

* **Managing configurations across environments**: Keep staging and production in sync with a controlled promotion process.
* **Bulk updates**: Modify multiple agents, behaviors, or evaluation criteria in a single operation.
* **Scripted deployments**: Integrate Agent Forge into CI/CD pipelines for automated testing and deployment.
* **Audit and rollback**: Maintain a complete history of configuration changes with the ability to revert.
* **Data exploration**: Query workspace data, profile tables, and run analytics templates without leaving the CLI.
* **Building agents from scratch**: Use Platform API commands to create agents, context graphs, and services entirely from the CLI.
* **Coverage testing**: Run simulation tests that automatically explore unvisited states and edge cases.
* **Change traceability**: Track configuration changes across all entity types with the changelog command.

{% hint style="info" %}
For setup instructions and detailed CLI documentation, see the Agent Forge repository at <https://github.com/amigo-ai/agent-forge>.
{% endhint %}
