Agent Forge CLI

CLI tool for syncing, versioning, and promoting agent configurations across staging and production environments.

Agent Forge is the CLI tool for managing agent configurations on the Amigo platform. It lets you create, update, version, and promote agent components programmatically rather than through the web interface.

Installation

Agent Forge ships as a single binary with no runtime dependencies. The installer detects your OS and architecture, downloads the correct binary, verifies the SHA256 checksum, and places it on your PATH.

# macOS / Linux / WSL
curl -fsSL https://forge.platform.amigo.ai/install.sh | sh

# Windows PowerShell
irm https://forge.platform.amigo.ai/install.ps1 | iex

Pre-built binaries are available for macOS (Intel and Apple Silicon), Linux (amd64 and arm64), and Windows (amd64). No Python, no package manager, no dependency resolution required.

After installation, configure credentials for your workspace:

# Create environment file
cp .env.platform.example .env.platform.<your-env>
# Edit with your Platform API URL, workspace ID, and API key or identity URL

# Verify
forge auth status --platform --env <your-env>

What Agent Forge Does

Agent Forge treats agent configurations as code. You sync configurations to local JSON files, make changes, and push them back to the platform. This gives you version control, reproducibility, and the ability to script deployment workflows.

Authentication

Agent Forge supports two authentication surfaces that correspond to the two API backends:

  • Legacy backend API - Uses the legacy identity device flow when configured, or static API key credentials.

  • Platform API - Uses platform identity device code authentication. Activate this path with the --platform flag: forge auth login --platform.

Both flows follow the same user experience. Forge displays a short user code and opens your browser to an approval page. You verify that the code shown in the browser matches the code in your terminal and approve the request. The approval page enforces that your browser session is scoped to the same workspace the device code targets - if your session is scoped to a different workspace, the page redirects you to workspace selection first, and after choosing the correct workspace you are returned to the approval page automatically. If you are not signed in at all, the sign-in flow preserves the approval page as the return destination through authentication and workspace selection, so you land back on the approval page without needing to re-open the CLI link. A session scoped to the wrong workspace or a session without any workspace selected cannot approve the code. Forge then receives an access token and refresh token automatically - no manual token management required. These flows work in headless environments, SSH sessions, and CI pipelines where a browser cannot be opened inline.

The platform identity device code flow is workspace-scoped. When you initiate a login, Forge sends the configured workspace ID along with the device code request. The identity service binds the code to that workspace - the approver in the browser must hold a session scoped to the same workspace, and the resulting CLI token is scoped to it. If the approver's session targets a different workspace, approval is rejected with a prompt to select the correct workspace. This workspace enforcement applies at both the approval step and the token exchange step, ensuring that credentials are always tied to the intended workspace and preventing cross-workspace token misuse.

Tokens for each surface are cached independently in the system keyring. An expired access token is silently refreshed using the stored refresh token without requiring re-authentication.

Environment Configuration

Platform API authentication reads from .env.platform.<env> (preferred) or falls back to .env.<env>. The following variables control the platform auth path:

Variable
Required
Description

PLATFORM_API_URL

Yes

Platform API URL for the target environment

PLATFORM_WORKSPACE_ID

Yes

Workspace to authenticate against

PLATFORM_API_KEY

One of these

Static API key (no login required)

IDENTITY_URL

One of these

Platform identity service URL (enables device code login)

If PLATFORM_API_KEY is set, Forge uses it as a static bearer token. If IDENTITY_URL is set instead, Forge uses the device code flow via forge auth login --platform.

Legacy backend authentication continues to read from .env.<env> using API key or identity configuration values.

Forge-native configuration fields are automatically translated to platform-native equivalents at deployment time. For example, audio filler phrases defined in Forge tool specs are converted to the platform's progress hint format, so agents configured through Forge work without manual migration.

Auth Commands

The --platform flag is available on login, logout, and status subcommands. Without it, all auth commands operate on the legacy backend credentials.

Agent Forge manages the following entity types:

  • Agents: Persona, background, directives, and communication style

  • Context graphs: Problem structure, states, transitions, and safety boundaries

  • Dynamic behaviors: Runtime behaviors with triggers and response logic

  • Metrics: Evaluation criteria, scoring rubrics, and custom metric definitions

  • Personas: Synthetic user profiles for simulation testing (the primary way to manage personas)

  • Scenarios: Test situations for simulation testing

  • Services: Link an agent and context graph into a deployable unit

  • Tools: Versioned code packages (called Actions in the conceptual docs)

  • Unit test sets: Groups of tests with success criteria

  • Unit tests: Individual test cases

  • User dimensions: Attributes that segment users for evaluation and analysis

Pre-Sync Validation

Agent Forge validates context graphs before syncing to the platform and surfaces warnings for common authoring mistakes. Validation runs automatically during sync-to-remote with no additional configuration.

Canonical Value Lint

The canonical value lint detects phone numbers, email addresses, and URLs hardcoded into context graph state prose. Inline canonical values cause silent data drift - when graphs are cloned or updated, hardcoded digits can be accidentally mutated, and the agent reads incorrect information to callers.

The validator scans prose fields in every state (descriptions, instructions, boundary constraints, exit conditions, and action descriptions) and emits a warning for each match, identifying the state, field, and value. It catches phone numbers in digit form (e.g., 555-010-1234), phone numbers in spelled-out TTS form (e.g., "five five five zero one zero..."), email addresses, and URLs.

To fix a warning, move the canonical value into structured context - such as a location entity in the world model or a workspace setting - and reference it abstractly in the state prose.

Core Operations

Sync to Local

Pull configurations from the platform to your local file system:

Platform Insights

Query workspace data, explore schema metadata, and get health digests through the platform insights service:

Call Trace Analysis

Access deep call understanding from the intelligence pipeline, including emotional arcs, key decision moments, coaching recommendations, and signal-response alignment:

Simulation Caller and Entity Context

The session-create, smoke-test, and bridge simulation commands accept caller context flags. Use --caller-id to set a simulated caller phone number in E.164 format (e.g. +16479718862). Use --entity-id to bind the session directly to a known world entity. Direct entity context is useful for regression tests that should always run against the same known patient or account fixture. Omit both flags to simulate an unknown caller.

Trace analysis provides:

  • Emotional arc - How caller sentiment evolved across the conversation

  • Key decision moments - Critical points with quality assessment and causal attribution

  • Coaching recommendations - Actionable improvements tied to specific call moments

  • Counterfactuals - Alternative actions that could have changed the outcome

  • Signal-response alignment - Whether the agent responded appropriately to caller signals

  • Interaction dynamics - Turn-taking quality, rapport trajectory, and repair effectiveness

Sync to Remote

Push local changes back to the platform:

Before applying changes, Agent Forge shows exactly what will be modified so you can review before confirming.

Environment Support

Agent Forge supports separate staging and production environments. Changes are deployed to staging first, validated through testing, and then promoted to production.

Typical Workflow

  1. Pull current configurations from the platform to your local environment.

  2. Make changes to the JSON configuration files.

  3. Push to staging and run your test sets to validate.

  4. Review results and iterate if tests fail.

  5. Promote to production after validation passes.

This workflow supports both manual changes and automated optimization. Teams can use Agent Forge directly for planned configuration updates, or set up automated pipelines that use Agent Forge to deploy and test changes as part of a continuous improvement process.

Analytics

The forge analyze command group provides SQL-based exploration of workspace data directly from the CLI, replacing the need for external analytics tools.

Query Commands

Command
Description

forge analyze query

Execute ad-hoc SQL SELECT queries (inline or from file). Results are capped and queries are time-bounded.

forge analyze describe

Preview a query's output schema without executing it - useful for validating JOINs and checking column types.

forge analyze tables

List available tables in the workspace schema. Supports SQL LIKE patterns for filtering.

forge analyze schema

Describe a table's columns: names, data types, and comments.

forge analyze sample

Preview sample rows from a table (default 5, max 20).

forge analyze detail

Rich table metadata: row count, size, partitioning, column nullability, data freshness.

forge analyze profile

Profile a column's data distribution: cardinality, null rate, min/max values.

forge analyze catalog

Display the full data catalog reference offline without a database connection.

Query Templates

Pre-built analytics query templates for common patterns like conversation volume, tool performance, and metric trends:

All commands support --json for structured output, enabling integration with scripts and CI/CD pipelines.

Voice Simulation

The forge platform sim command group provides CLI access to VoiceSim for exploring voice configuration space:

Command
Description

forge platform sim create

Create a new simulation run

forge platform sim sample

Sample and evaluate N configuration points

forge platform sim evaluate

Evaluate a specific configuration point against a scenario

forge platform sim status

Get run status and best result

forge platform sim summary

Aggregated summary with best-per-scenario and penalty frequency

forge platform sim points

List scored points (by score or chronologically)

forge platform sim complete

Mark a run as finished

See Voice Simulation for conceptual background.

Platform Simulation Testing

For testing agent configurations through platform simulation sessions (distinct from VoiceSim configuration tuning):

Command
Description

forge platform sim smoke-test

Single-turn sanity check via a tracked platform session

forge platform sim bridge

Multi-scenario AI-driven testing using tracked simulation runs

forge platform sim run-create

Create a tracked simulation run

forge platform sim run-list

List simulation runs with filtering

forge platform sim run-complete

Mark a simulation run as complete

Smoke-test creates a session, sends a test message, and reports the agent's response. Bridge generates diverse scenarios, runs multi-turn conversations with an LLM caller persona, and tracks results. Both create tracked runs that appear in the Agent Performance dashboard.

Text Conversation Smoke Tests

Agent Forge provides commands for verifying that text conversation endpoints are working correctly. These are useful during initial setup, after configuration changes, or as part of a deployment validation pipeline.

Create and Send Message (REST)

Create a durable text conversation, then send user messages through the REST turns endpoint and display the agent's response:

send-message requires an existing conversation ID. Use conversation create first and pass the returned ID on subsequent calls to continue the same conversation thread.

WebSocket Smoke Test

The forge platform conversation text-ws-smoke command opens a streaming text session, sends a single message, and waits for the agent response. It requires an entity ID for context (--entity-id). Use it to verify the streaming text path end-to-end; use the REST send-message command when you want to drive a multi-turn durable conversation. The command accepts --service-id, --message, --conversation-id, and --entity-id.

Conversation Quality Check

The forge quality check command scans workspace conversations for agent behavioral issues - stuck loops, degenerate output, repetition, and other quality problems. It queries production conversation data directly and runs pattern-based detectors to surface problematic interactions.

Detectors

Detector
What It Finds

Character degeneration

Repeated characters, low entropy output, stuttering patterns

Stuck agent loops

Agent repeats the same response while the caller changes topics

Repetitive patterns

High similarity across sliding message windows

Word salad

Incoherent output patterns like or-chains and excessive word repetition

Phantom success

Agent claims a tool call succeeded when the tool actually returned an error

Wrong tool inputs

A tool is called with parameters that do not match what the caller actually asked for

Ungrounded claims

The agent asserts capabilities or facts not supported by the configured entity definitions

Safety / PII

The agent leaks sensitive information or provides unsafe guidance

Some detectors (phantom mismatch, wrong tool inputs, ungrounded claims, and safety/PII) require an LLM key to run.

Results include conversation IDs, timestamps, detector names, and severity. Use --verbose to see the actual message excerpts that triggered each finding.

See Voice Simulation and Drift Detection for related quality monitoring capabilities.

Tool Testing

The forge platform tool-test commands let you test context graph tools without making phone calls:

Command
Description

forge platform tool-test resolve

List available tools for a service with input schemas

forge platform tool-test execute

Execute a tool with custom parameters and optional dry run mode

Text Conversation Testing

The forge platform conversation command group tests text conversations through the REST and WebSocket APIs without a phone or browser.

Command
Description

forge platform conversation create

Create a durable text conversation for a service and optional entity context

forge platform conversation send-message

Send a single message to an existing conversation via the REST API and print the agent's response

forge platform conversation text-ws-smoke

Open a streaming text session, send one message, and require an agent response (requires --entity-id for context). Useful for verifying the streaming text path end-to-end.

All conversation commands support --json for structured output and --env for environment selection.

CLI Updates

Agent Forge includes a built-in update mechanism:

The CLI can also check for updates automatically in the background. When updates are available, the CLI prompts before applying. Uncommitted local changes are stashed during the update and restored afterward.

Metric Versioning

Metrics support version tracking. Each metric can have multiple versions, with the latest_version field tracking the current iteration. This enables teams to evolve evaluation criteria over time while maintaining a history of how metrics were defined at each point. Older metric configurations are automatically migrated to the versioned schema.

Authentication

Agent Forge supports two authentication methods, selected automatically based on the environment configuration.

Device code authentication following RFC 8628. When you run forge auth login, the CLI requests a device code from the identity service, opens your browser to an approval page, and polls for authorization. Once approved, the access token and refresh token are cached in your system keyring. Subsequent commands use the cached token and refresh silently when it expires.

Platform Identity is used when the environment configuration includes an identity URL. It replaces the need for a static API key for interactive CLI use.

API Key

Static bearer token authentication. Generate an API key from the Developer Console (Settings > API Keys) and add it to your environment file. API keys do not expire and are suitable for CI/CD pipelines and automated scripts where interactive login is not possible.

Platform API Commands

Agent Forge provides broad CLI coverage for common Platform API workflows, enabling agent building and workspace management without the web interface. The forge platform command group covers the active resources listed below.

Setup

Configure authentication using one of the methods above. For API key authentication, add to your environment file:

E2E Agent Building

Build a complete agent from the CLI in four steps:

  1. Create agent and agent version with identity, background, and behaviors

  2. Create context graph and version with states, transitions, and exit conditions

  3. Create service linking the agent and context graph together

  4. Add skills (optional) for LLM-backed micro-agent capabilities

Resource Management

CLI support is organized into these resource groups:

Resource Group
Commands

Core

workspace, agent, context-graph, service, version-set, persona, skill

Voice & Text

call, conversation, voice-settings, operator

Data

integration, fhir, function, settings

Surfaces

surface

Testing

sim, simulation, tool-test

Operations

api-key, selected settings and status commands

All commands support --json for structured output and --env for environment selection.

Bulk Push

Push local entity configurations to the platform in a single operation:

Supports selective push by entity type (-e agent, -e context-graph, -e service).

Trigger Management

The forge platform trigger command group manages scheduled action triggers - cron-based automation that dispatches workspace actions on a recurring basis.

Command
Description

forge platform trigger create

Create a trigger with cron schedule and action binding

forge platform trigger list

List triggers with active/inactive filtering

forge platform trigger get

Get trigger details including next fire time

forge platform trigger update

Update trigger configuration

forge platform trigger delete

Delete a trigger

forge platform trigger pause

Pause a trigger's schedule

forge platform trigger resume

Resume a paused trigger

forge platform trigger fire

Manually fire a trigger for testing

forge platform trigger runs

View trigger execution history

Triggers bind cron schedules to actions. When the schedule fires, the action is dispatched immediately. Each execution is tracked as an event with AUTOMATION source provenance. See Outbound for how triggers fit into the platform's automated contact patterns.

Platform Functions

The forge platform function command group manages platform functions - declarative SQL, Python, AI, and table-valued (UDTF) functions that agents can call mid-conversation. Table-valued functions return rows rather than a single value.

Command
Description

forge platform function register

Register a new platform function with its definition and metadata

forge platform function list

List all registered functions in the workspace

forge platform function test

Execute a function with test parameters and inspect the result

forge platform function delete

Remove a function registration

forge platform function query

Run an open-scope SQL query against workspace data

forge platform function catalog

Display the full function catalog with signatures and descriptions

forge platform function sync

Sync function definitions between local files and the platform

See Platform Functions for conceptual background.

Escalation Policy

The forge platform service escalation-policy command configures how a service handles escalation triggers - what happens when the agent determines a call should be escalated to a human operator, forwarded to another number, or ended.

Three presets cover the common cases where all triggers should route to the same action type. For fine-grained control, --body accepts a partial policy that merges with the current configuration - you can update individual triggers without re-stating the entire policy.

See Operators and Escalation for conceptual background on escalation triggers and actions.

Metrics Management

The forge platform metrics command group manages workspace metric definitions and runs evaluations from the CLI.

Command
Description

forge platform metrics settings

View or update workspace-level metric configuration

forge platform metrics define

Create or update a metric definition with scoring criteria

forge platform metrics evaluate

Run metric evaluation against one or more conversations

Simulation Coverage

The forge platform simulation command group manages branch-and-bound simulation coverage runs that systematically explore context graph state space.

Command
Description

forge platform simulation run create

Create a new coverage run for a service

forge platform simulation run list

List coverage runs for a service

forge platform simulation run complete

Complete a run

forge platform simulation session create

Create a session within a coverage run

forge platform simulation session step

Step a session forward with a simulated user message

forge platform simulation session fork

Fork a session into children at a decision point, each with a different message

forge platform simulation session score

Score a session against configured metrics

forge platform simulation graph show

Retrieve the coverage knowledge graph with topology overlay and ghost nodes

forge platform simulation graph paths

List observed paths through the coverage graph

See Simulation Coverage for conceptual background.

Insights

The forge platform insights command group provides conversational data exploration from the CLI, wrapping the platform's Insights Agent capabilities.

Command
Description

forge platform insights sql

Execute a SQL query against workspace data and return formatted results

forge platform insights schema

Describe available tables and columns in the workspace schema

forge platform insights digest

Generate an AI-powered digest summarizing recent workspace activity and trends

forge platform insights suggestions

Get suggested queries based on the workspace's data and recent activity

Trace Analysis

The forge platform trace command group provides call trace analysis from the CLI, wrapping the platform's trace analysis capabilities.

Command
Description

forge platform trace list

List call traces with filters for date range, service, quality score, and direction

forge platform trace get

Get detailed trace analysis for a specific call, including emotional arc, decision moments, component attribution, and coaching recommendations

Trace output uses rich formatting with colored outcome indicators and structured digest sections for quick scanning of call quality issues.

Surface E2E Testing

The forge platform surface e2e command tests the full surface lifecycle from the CLI - surface creation, spec retrieval, form rendering, and data submission - in a single command. It validates that branding settings, field rendering, and data flow are working correctly end-to-end.

This is useful for verifying surface configuration changes (branding, field types, sections) before deploying to production. The command exercises the same API paths and rendering pipeline that patients use.

Simulation Testing

The forge simulation command group provides coverage-optimized simulation testing against context graphs. It automatically steers simulated conversations toward unvisited states, behaviors, and tools to maximize test coverage.

How It Works

Each simulation turn follows a scoring loop:

  1. The platform generates recommended user responses (graph-unaware)

  2. An LLM classifier predicts which state each response would transition to

  3. A scorer ranks responses by expected coverage value using graph structure

  4. The highest-scoring response is sent as the simulated user message

  5. Coverage state is updated based on the agent's response

Commands

Command
Description

forge simulation run

Execute a simulation with configurable sessions, turn budgets, and coverage targets

forge simulation plan

Generate a target spec from a natural-language objective (e.g., "test the cancellation flow end-to-end")

forge simulation bridge

Generate scenario variations from a natural-language objective, run multi-turn conversations with LLM-driven personas, and track coverage using interaction insights

forge simulation evaluate

Compare metric scores across simulation runs, including before/after diff mode

forge simulation cleanup

Delete ephemeral test users created by simulation runs

Configuration

Simulations are highly configurable:

Setting
Default
Description

Sessions

3

Number of parallel conversations

Max turns

20

Maximum turns per session

Budget

100

Total turn budget across all sessions

Algorithm

frontier

Scoring algorithm: frontier, heatmap, or random

Temperament

random

Simulated user personality: cooperative, neutral, frustrated, confused, skeptical, or random

Target specs can be generated from natural-language objectives using forge simulation plan, which translates goals into structured coverage targets based on the context graph structure.

Simulation Bridge

The forge simulation bridge command combines scenario generation with multi-turn conversation execution. You describe what you want to test in natural language, and the bridge generates diverse scenario variations, runs each as a full conversation with an LLM-driven persona, and collects interaction insights after every turn for coverage tracking.

Each scenario includes a persona background, temperament (cooperative, frustrated, confused, skeptical, or neutral), and instructions that guide the simulated caller's behavior throughout the conversation. The bridge tracks which context graph states, tools, and dynamic behaviors were exercised across all scenarios, giving you coverage visibility without manually designing each test case.

The bridge also pulls interaction insights after each agent turn - the same detailed reasoning audit available for production calls - so you can see which memories were active, what state transitions occurred, and which tools were considered at every step of every scenario.

Result Persistence and Reports

Simulation bridge results are persisted locally across runs, enabling trend analysis and regression detection. After a run completes, you can generate summary reports with pass/fail counts, score distributions, and failure breakdowns. Comparing current results against previous runs shows whether a configuration change improved or degraded coverage.

Tag simulation scenarios for selective execution - for example, forge simulation bridge --tag scheduling runs only scheduling-related scenarios. Tags let you build a reusable test library that grows over time as you discover edge cases worth preserving.

Changelog Command

The forge changelog command provides cross-entity change traceability - tracking what changed across agents, context graphs, behaviors, and metrics over time. This gives teams visibility into configuration drift without relying on external version control tooling.

When to Use Agent Forge

  • Managing configurations across environments: Keep staging and production in sync with a controlled promotion process.

  • Bulk updates: Modify multiple agents, behaviors, or evaluation criteria in a single operation.

  • Scripted deployments: Integrate Agent Forge into CI/CD pipelines for automated testing and deployment.

  • Audit and rollback: Maintain a complete history of configuration changes with the ability to revert.

  • Data exploration: Query workspace data, profile tables, and run analytics templates without leaving the CLI.

  • Building agents from scratch: Use Platform API commands to create agents, context graphs, and services entirely from the CLI.

  • Coverage testing: Run simulation tests that automatically explore unvisited states and edge cases.

  • Change traceability: Track configuration changes across all entity types with the changelog command.

Use the Platform API developer guide for setup, authentication, and workspace configuration details. This reference page covers the Agent Forge command surface.

Last updated

Was this helpful?