Agent Forge CLI
CLI tool for syncing, versioning, and promoting agent configurations across staging and production environments.
Agent Forge is the CLI tool for managing agent configurations on the Amigo platform. It lets you create, update, version, and promote agent components programmatically rather than through the web interface.
Installation
Agent Forge ships as a single binary with no runtime dependencies. The installer detects your OS and architecture, downloads the correct binary, verifies the SHA256 checksum, and places it on your PATH.
# macOS / Linux / WSL
curl -fsSL https://forge.platform.amigo.ai/install.sh | sh
# Windows PowerShell
irm https://forge.platform.amigo.ai/install.ps1 | iexPre-built binaries are available for macOS (Intel and Apple Silicon), Linux (amd64 and arm64), and Windows (amd64). No Python, no package manager, no dependency resolution required.
After installation, configure credentials for your workspace:
# Create environment file
cp .env.platform.example .env.platform.<your-env>
# Edit with your Platform API URL, workspace ID, and API key or identity URL
# Verify
forge auth status --platform --env <your-env>What Agent Forge Does
Agent Forge treats agent configurations as code. You sync configurations to local JSON files, make changes, and push them back to the platform. This gives you version control, reproducibility, and the ability to script deployment workflows.
Authentication
Agent Forge supports two authentication surfaces that correspond to the two API backends:
Legacy backend API - Uses the legacy identity device flow when configured, or static API key credentials.
Platform API - Uses platform identity device code authentication. Activate this path with the
--platformflag:forge auth login --platform.
Both flows follow the same user experience. Forge displays a short user code and opens your browser to an approval page. You verify that the code shown in the browser matches the code in your terminal and approve the request. The approval page enforces that your browser session is scoped to the same workspace the device code targets - if your session is scoped to a different workspace, the page redirects you to workspace selection first, and after choosing the correct workspace you are returned to the approval page automatically. If you are not signed in at all, the sign-in flow preserves the approval page as the return destination through authentication and workspace selection, so you land back on the approval page without needing to re-open the CLI link. A session scoped to the wrong workspace or a session without any workspace selected cannot approve the code. Forge then receives an access token and refresh token automatically - no manual token management required. These flows work in headless environments, SSH sessions, and CI pipelines where a browser cannot be opened inline.
The platform identity device code flow is workspace-scoped. When you initiate a login, Forge sends the configured workspace ID along with the device code request. The identity service binds the code to that workspace - the approver in the browser must hold a session scoped to the same workspace, and the resulting CLI token is scoped to it. If the approver's session targets a different workspace, approval is rejected with a prompt to select the correct workspace. This workspace enforcement applies at both the approval step and the token exchange step, ensuring that credentials are always tied to the intended workspace and preventing cross-workspace token misuse.
Tokens for each surface are cached independently in the system keyring. An expired access token is silently refreshed using the stored refresh token without requiring re-authentication.
Environment Configuration
Platform API authentication reads from .env.platform.<env> (preferred) or falls back to .env.<env>. The following variables control the platform auth path:
PLATFORM_API_URL
Yes
Platform API URL for the target environment
PLATFORM_WORKSPACE_ID
Yes
Workspace to authenticate against
PLATFORM_API_KEY
One of these
Static API key (no login required)
IDENTITY_URL
One of these
Platform identity service URL (enables device code login)
If PLATFORM_API_KEY is set, Forge uses it as a static bearer token. If IDENTITY_URL is set instead, Forge uses the device code flow via forge auth login --platform.
Legacy backend authentication continues to read from .env.<env> using API key or identity configuration values.
Forge-native configuration fields are automatically translated to platform-native equivalents at deployment time. For example, audio filler phrases defined in Forge tool specs are converted to the platform's progress hint format, so agents configured through Forge work without manual migration.
Auth Commands
The --platform flag is available on login, logout, and status subcommands. Without it, all auth commands operate on the legacy backend credentials.
Agent Forge manages the following entity types:
Agents: Persona, background, directives, and communication style
Context graphs: Problem structure, states, transitions, and safety boundaries
Dynamic behaviors: Runtime behaviors with triggers and response logic
Metrics: Evaluation criteria, scoring rubrics, and custom metric definitions
Personas: Synthetic user profiles for simulation testing (the primary way to manage personas)
Scenarios: Test situations for simulation testing
Services: Link an agent and context graph into a deployable unit
Tools: Versioned code packages (called Actions in the conceptual docs)
Unit test sets: Groups of tests with success criteria
Unit tests: Individual test cases
User dimensions: Attributes that segment users for evaluation and analysis
Pre-Sync Validation
Agent Forge validates context graphs before syncing to the platform and surfaces warnings for common authoring mistakes. Validation runs automatically during sync-to-remote with no additional configuration.
Canonical Value Lint
The canonical value lint detects phone numbers, email addresses, and URLs hardcoded into context graph state prose. Inline canonical values cause silent data drift - when graphs are cloned or updated, hardcoded digits can be accidentally mutated, and the agent reads incorrect information to callers.
The validator scans prose fields in every state (descriptions, instructions, boundary constraints, exit conditions, and action descriptions) and emits a warning for each match, identifying the state, field, and value. It catches phone numbers in digit form (e.g., 555-010-1234), phone numbers in spelled-out TTS form (e.g., "five five five zero one zero..."), email addresses, and URLs.
To fix a warning, move the canonical value into structured context - such as a location entity in the world model or a workspace setting - and reference it abstractly in the state prose.
Core Operations
Sync to Local
Pull configurations from the platform to your local file system:
Platform Insights
Query workspace data, explore schema metadata, and get health digests through the platform insights service:
Call Trace Analysis
Access deep call understanding from the intelligence pipeline, including emotional arcs, key decision moments, coaching recommendations, and signal-response alignment:
Simulation Caller and Entity Context
The session-create, smoke-test, and bridge simulation commands accept caller context flags. Use --caller-id to set a simulated caller phone number in E.164 format (e.g. +16479718862). Use --entity-id to bind the session directly to a known world entity. Direct entity context is useful for regression tests that should always run against the same known patient or account fixture. Omit both flags to simulate an unknown caller.
Trace analysis provides:
Emotional arc - How caller sentiment evolved across the conversation
Key decision moments - Critical points with quality assessment and causal attribution
Coaching recommendations - Actionable improvements tied to specific call moments
Counterfactuals - Alternative actions that could have changed the outcome
Signal-response alignment - Whether the agent responded appropriately to caller signals
Interaction dynamics - Turn-taking quality, rapport trajectory, and repair effectiveness
Sync to Remote
Push local changes back to the platform:
Before applying changes, Agent Forge shows exactly what will be modified so you can review before confirming.
Environment Support
Agent Forge supports separate staging and production environments. Changes are deployed to staging first, validated through testing, and then promoted to production.
Typical Workflow
Pull current configurations from the platform to your local environment.
Make changes to the JSON configuration files.
Push to staging and run your test sets to validate.
Review results and iterate if tests fail.
Promote to production after validation passes.
This workflow supports both manual changes and automated optimization. Teams can use Agent Forge directly for planned configuration updates, or set up automated pipelines that use Agent Forge to deploy and test changes as part of a continuous improvement process.
Analytics
The forge analyze command group provides SQL-based exploration of workspace data directly from the CLI, replacing the need for external analytics tools.
Query Commands
forge analyze query
Execute ad-hoc SQL SELECT queries (inline or from file). Results are capped and queries are time-bounded.
forge analyze describe
Preview a query's output schema without executing it - useful for validating JOINs and checking column types.
forge analyze tables
List available tables in the workspace schema. Supports SQL LIKE patterns for filtering.
forge analyze schema
Describe a table's columns: names, data types, and comments.
forge analyze sample
Preview sample rows from a table (default 5, max 20).
forge analyze detail
Rich table metadata: row count, size, partitioning, column nullability, data freshness.
forge analyze profile
Profile a column's data distribution: cardinality, null rate, min/max values.
forge analyze catalog
Display the full data catalog reference offline without a database connection.
Query Templates
Pre-built analytics query templates for common patterns like conversation volume, tool performance, and metric trends:
All commands support --json for structured output, enabling integration with scripts and CI/CD pipelines.
Voice Simulation
The forge platform sim command group provides CLI access to VoiceSim for exploring voice configuration space:
forge platform sim create
Create a new simulation run
forge platform sim sample
Sample and evaluate N configuration points
forge platform sim evaluate
Evaluate a specific configuration point against a scenario
forge platform sim status
Get run status and best result
forge platform sim summary
Aggregated summary with best-per-scenario and penalty frequency
forge platform sim points
List scored points (by score or chronologically)
forge platform sim complete
Mark a run as finished
See Voice Simulation for conceptual background.
Platform Simulation Testing
For testing agent configurations through platform simulation sessions (distinct from VoiceSim configuration tuning):
forge platform sim smoke-test
Single-turn sanity check via a tracked platform session
forge platform sim bridge
Multi-scenario AI-driven testing using tracked simulation runs
forge platform sim run-create
Create a tracked simulation run
forge platform sim run-list
List simulation runs with filtering
forge platform sim run-complete
Mark a simulation run as complete
Smoke-test creates a session, sends a test message, and reports the agent's response. Bridge generates diverse scenarios, runs multi-turn conversations with an LLM caller persona, and tracks results. Both create tracked runs that appear in the Agent Performance dashboard.
Text Conversation Smoke Tests
Agent Forge provides commands for verifying that text conversation endpoints are working correctly. These are useful during initial setup, after configuration changes, or as part of a deployment validation pipeline.
Create and Send Message (REST)
Create a durable text conversation, then send user messages through the REST turns endpoint and display the agent's response:
send-message requires an existing conversation ID. Use conversation create first and pass the returned ID on subsequent calls to continue the same conversation thread.
WebSocket Smoke Test
The forge platform conversation text-ws-smoke command opens a streaming text session, sends a single message, and waits for the agent response. It requires an entity ID for context (--entity-id). Use it to verify the streaming text path end-to-end; use the REST send-message command when you want to drive a multi-turn durable conversation. The command accepts --service-id, --message, --conversation-id, and --entity-id.
Conversation Quality Check
The forge quality check command scans workspace conversations for agent behavioral issues - stuck loops, degenerate output, repetition, and other quality problems. It queries production conversation data directly and runs pattern-based detectors to surface problematic interactions.
Detectors
Character degeneration
Repeated characters, low entropy output, stuttering patterns
Stuck agent loops
Agent repeats the same response while the caller changes topics
Repetitive patterns
High similarity across sliding message windows
Word salad
Incoherent output patterns like or-chains and excessive word repetition
Phantom success
Agent claims a tool call succeeded when the tool actually returned an error
Wrong tool inputs
A tool is called with parameters that do not match what the caller actually asked for
Ungrounded claims
The agent asserts capabilities or facts not supported by the configured entity definitions
Safety / PII
The agent leaks sensitive information or provides unsafe guidance
Some detectors (phantom mismatch, wrong tool inputs, ungrounded claims, and safety/PII) require an LLM key to run.
Results include conversation IDs, timestamps, detector names, and severity. Use --verbose to see the actual message excerpts that triggered each finding.
See Voice Simulation and Drift Detection for related quality monitoring capabilities.
Tool Testing
The forge platform tool-test commands let you test context graph tools without making phone calls:
forge platform tool-test resolve
List available tools for a service with input schemas
forge platform tool-test execute
Execute a tool with custom parameters and optional dry run mode
Text Conversation Testing
The forge platform conversation command group tests text conversations through the REST and WebSocket APIs without a phone or browser.
forge platform conversation create
Create a durable text conversation for a service and optional entity context
forge platform conversation send-message
Send a single message to an existing conversation via the REST API and print the agent's response
forge platform conversation text-ws-smoke
Open a streaming text session, send one message, and require an agent response (requires --entity-id for context). Useful for verifying the streaming text path end-to-end.
All conversation commands support --json for structured output and --env for environment selection.
CLI Updates
Agent Forge includes a built-in update mechanism:
The CLI can also check for updates automatically in the background. When updates are available, the CLI prompts before applying. Uncommitted local changes are stashed during the update and restored afterward.
Metric Versioning
Metrics support version tracking. Each metric can have multiple versions, with the latest_version field tracking the current iteration. This enables teams to evolve evaluation criteria over time while maintaining a history of how metrics were defined at each point. Older metric configurations are automatically migrated to the versioned schema.
Authentication
Agent Forge supports two authentication methods, selected automatically based on the environment configuration.
Platform Identity (Recommended)
Device code authentication following RFC 8628. When you run forge auth login, the CLI requests a device code from the identity service, opens your browser to an approval page, and polls for authorization. Once approved, the access token and refresh token are cached in your system keyring. Subsequent commands use the cached token and refresh silently when it expires.
Platform Identity is used when the environment configuration includes an identity URL. It replaces the need for a static API key for interactive CLI use.
API Key
Static bearer token authentication. Generate an API key from the Developer Console (Settings > API Keys) and add it to your environment file. API keys do not expire and are suitable for CI/CD pipelines and automated scripts where interactive login is not possible.
Platform API Commands
Agent Forge provides broad CLI coverage for common Platform API workflows, enabling agent building and workspace management without the web interface. The forge platform command group covers the active resources listed below.
Setup
Configure authentication using one of the methods above. For API key authentication, add to your environment file:
E2E Agent Building
Build a complete agent from the CLI in four steps:
Create agent and agent version with identity, background, and behaviors
Create context graph and version with states, transitions, and exit conditions
Create service linking the agent and context graph together
Add skills (optional) for LLM-backed micro-agent capabilities
Resource Management
CLI support is organized into these resource groups:
Core
workspace, agent, context-graph, service, version-set, persona, skill
Voice & Text
call, conversation, voice-settings, operator
Data
integration, fhir, function, settings
Surfaces
surface
Testing
sim, simulation, tool-test
Operations
api-key, selected settings and status commands
All commands support --json for structured output and --env for environment selection.
Bulk Push
Push local entity configurations to the platform in a single operation:
Supports selective push by entity type (-e agent, -e context-graph, -e service).
Trigger Management
The forge platform trigger command group manages scheduled action triggers - cron-based automation that dispatches workspace actions on a recurring basis.
forge platform trigger create
Create a trigger with cron schedule and action binding
forge platform trigger list
List triggers with active/inactive filtering
forge platform trigger get
Get trigger details including next fire time
forge platform trigger update
Update trigger configuration
forge platform trigger delete
Delete a trigger
forge platform trigger pause
Pause a trigger's schedule
forge platform trigger resume
Resume a paused trigger
forge platform trigger fire
Manually fire a trigger for testing
forge platform trigger runs
View trigger execution history
Triggers bind cron schedules to actions. When the schedule fires, the action is dispatched immediately. Each execution is tracked as an event with AUTOMATION source provenance. See Outbound for how triggers fit into the platform's automated contact patterns.
Platform Functions
The forge platform function command group manages platform functions - declarative SQL, Python, AI, and table-valued (UDTF) functions that agents can call mid-conversation. Table-valued functions return rows rather than a single value.
forge platform function register
Register a new platform function with its definition and metadata
forge platform function list
List all registered functions in the workspace
forge platform function test
Execute a function with test parameters and inspect the result
forge platform function delete
Remove a function registration
forge platform function query
Run an open-scope SQL query against workspace data
forge platform function catalog
Display the full function catalog with signatures and descriptions
forge platform function sync
Sync function definitions between local files and the platform
See Platform Functions for conceptual background.
Escalation Policy
The forge platform service escalation-policy command configures how a service handles escalation triggers - what happens when the agent determines a call should be escalated to a human operator, forwarded to another number, or ended.
Three presets cover the common cases where all triggers should route to the same action type. For fine-grained control, --body accepts a partial policy that merges with the current configuration - you can update individual triggers without re-stating the entire policy.
See Operators and Escalation for conceptual background on escalation triggers and actions.
Metrics Management
The forge platform metrics command group manages workspace metric definitions and runs evaluations from the CLI.
forge platform metrics settings
View or update workspace-level metric configuration
forge platform metrics define
Create or update a metric definition with scoring criteria
forge platform metrics evaluate
Run metric evaluation against one or more conversations
Simulation Coverage
The forge platform simulation command group manages branch-and-bound simulation coverage runs that systematically explore context graph state space.
forge platform simulation run create
Create a new coverage run for a service
forge platform simulation run list
List coverage runs for a service
forge platform simulation run complete
Complete a run
forge platform simulation session create
Create a session within a coverage run
forge platform simulation session step
Step a session forward with a simulated user message
forge platform simulation session fork
Fork a session into children at a decision point, each with a different message
forge platform simulation session score
Score a session against configured metrics
forge platform simulation graph show
Retrieve the coverage knowledge graph with topology overlay and ghost nodes
forge platform simulation graph paths
List observed paths through the coverage graph
See Simulation Coverage for conceptual background.
Insights
The forge platform insights command group provides conversational data exploration from the CLI, wrapping the platform's Insights Agent capabilities.
forge platform insights sql
Execute a SQL query against workspace data and return formatted results
forge platform insights schema
Describe available tables and columns in the workspace schema
forge platform insights digest
Generate an AI-powered digest summarizing recent workspace activity and trends
forge platform insights suggestions
Get suggested queries based on the workspace's data and recent activity
Trace Analysis
The forge platform trace command group provides call trace analysis from the CLI, wrapping the platform's trace analysis capabilities.
forge platform trace list
List call traces with filters for date range, service, quality score, and direction
forge platform trace get
Get detailed trace analysis for a specific call, including emotional arc, decision moments, component attribution, and coaching recommendations
Trace output uses rich formatting with colored outcome indicators and structured digest sections for quick scanning of call quality issues.
Surface E2E Testing
The forge platform surface e2e command tests the full surface lifecycle from the CLI - surface creation, spec retrieval, form rendering, and data submission - in a single command. It validates that branding settings, field rendering, and data flow are working correctly end-to-end.
This is useful for verifying surface configuration changes (branding, field types, sections) before deploying to production. The command exercises the same API paths and rendering pipeline that patients use.
Simulation Testing
The forge simulation command group provides coverage-optimized simulation testing against context graphs. It automatically steers simulated conversations toward unvisited states, behaviors, and tools to maximize test coverage.
How It Works
Each simulation turn follows a scoring loop:
The platform generates recommended user responses (graph-unaware)
An LLM classifier predicts which state each response would transition to
A scorer ranks responses by expected coverage value using graph structure
The highest-scoring response is sent as the simulated user message
Coverage state is updated based on the agent's response
Commands
forge simulation run
Execute a simulation with configurable sessions, turn budgets, and coverage targets
forge simulation plan
Generate a target spec from a natural-language objective (e.g., "test the cancellation flow end-to-end")
forge simulation bridge
Generate scenario variations from a natural-language objective, run multi-turn conversations with LLM-driven personas, and track coverage using interaction insights
forge simulation evaluate
Compare metric scores across simulation runs, including before/after diff mode
forge simulation cleanup
Delete ephemeral test users created by simulation runs
Configuration
Simulations are highly configurable:
Sessions
3
Number of parallel conversations
Max turns
20
Maximum turns per session
Budget
100
Total turn budget across all sessions
Algorithm
frontier
Scoring algorithm: frontier, heatmap, or random
Temperament
random
Simulated user personality: cooperative, neutral, frustrated, confused, skeptical, or random
Target specs can be generated from natural-language objectives using forge simulation plan, which translates goals into structured coverage targets based on the context graph structure.
Simulation Bridge
The forge simulation bridge command combines scenario generation with multi-turn conversation execution. You describe what you want to test in natural language, and the bridge generates diverse scenario variations, runs each as a full conversation with an LLM-driven persona, and collects interaction insights after every turn for coverage tracking.
Each scenario includes a persona background, temperament (cooperative, frustrated, confused, skeptical, or neutral), and instructions that guide the simulated caller's behavior throughout the conversation. The bridge tracks which context graph states, tools, and dynamic behaviors were exercised across all scenarios, giving you coverage visibility without manually designing each test case.
The bridge also pulls interaction insights after each agent turn - the same detailed reasoning audit available for production calls - so you can see which memories were active, what state transitions occurred, and which tools were considered at every step of every scenario.
Result Persistence and Reports
Simulation bridge results are persisted locally across runs, enabling trend analysis and regression detection. After a run completes, you can generate summary reports with pass/fail counts, score distributions, and failure breakdowns. Comparing current results against previous runs shows whether a configuration change improved or degraded coverage.
Tag simulation scenarios for selective execution - for example, forge simulation bridge --tag scheduling runs only scheduling-related scenarios. Tags let you build a reusable test library that grows over time as you discover edge cases worth preserving.
Changelog Command
The forge changelog command provides cross-entity change traceability - tracking what changed across agents, context graphs, behaviors, and metrics over time. This gives teams visibility into configuration drift without relying on external version control tooling.
When to Use Agent Forge
Managing configurations across environments: Keep staging and production in sync with a controlled promotion process.
Bulk updates: Modify multiple agents, behaviors, or evaluation criteria in a single operation.
Scripted deployments: Integrate Agent Forge into CI/CD pipelines for automated testing and deployment.
Audit and rollback: Maintain a complete history of configuration changes with the ability to revert.
Data exploration: Query workspace data, profile tables, and run analytics templates without leaving the CLI.
Building agents from scratch: Use Platform API commands to create agents, context graphs, and services entirely from the CLI.
Coverage testing: Run simulation tests that automatically explore unvisited states and edge cases.
Change traceability: Track configuration changes across all entity types with the changelog command.
Use the Platform API developer guide for setup, authentication, and workspace configuration details. This reference page covers the Agent Forge command surface.
Last updated
Was this helpful?

