Agent Teams Guide

Agent Teams (Claude Code v2.1.32+ Research Preview) enables multi-agent coordination for parallelizing audit, sprint, and security review workflows in claude-craft. This guide covers setup, usage, cost analysis, and known limitations.

Prerequisites

Requirement	Minimum Version	Check
Claude Code	v2.1.97+ (v2.1.193 recommended, CVE-2025-59536 patched)	`claude --version`
Environment variable	`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` (must be enabled explicitly)	`echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS`
Claude model	Opus 4.8 (recommended for leader)	Model selector in Claude Code

Important: Agent Teams is no longer always-on. You must explicitly enable it:

bash

export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

Installation

The /team:* commands rely on helper scripts in Tools/AgentTeams/lib/ (cost-estimator.sh, cost-dashboard.sh, ralph-teams-adapter.sh, compatibility-check.sh, result-aggregator.sh).

These are installed automatically into your project whenever the common rules/commands are installed (the npx installer and make install-common both ship the Team namespace together with these scripts).

To install or refresh them manually:

bash

# From the claude-craft repo (or npm package), targeting your project:
make install-agentteams TARGET=/path/to/your/project

# Or directly:
bash Tools/AgentTeams/install-agentteams.sh /path/to/your/project

# Verify they are present:
bash Tools/AgentTeams/install-agentteams.sh --check /path/to/your/project

If the scripts are absent, the /team:* commands degrade gracefully: cost estimation falls back to manual, and --ralph-mode is unavailable — the commands still run.

When to Use Agent Teams

Agent Teams adds coordination overhead. Use it only when the parallelization benefit outweighs the cost.

Scenario	Recommendation	Why
Full audit, 2+ tech stacks	USE PARALLEL	45-62% time savings, +30-37% token overhead
Full audit, 1 tech stack	STAY SEQUENTIAL	No time savings; overhead wasted
Pre-merge check, 2+ stacks	USE PARALLEL	Tests + linting + security run in parallel
Pre-commit check	STAY SEQUENTIAL	Already fast (< 2 min); overhead not justified
Sprint, 3+ independent stories	USE PARALLEL	Stories processed concurrently
Sprint, 1-2 stories	STAY SEQUENTIAL	Coordination cost exceeds benefit
Full lifecycle (write + implement), 3+ stories	USE PARALLEL (team-delivery)	~2.2x speedup, cross-phase context, file domain mapping
Full lifecycle, < 3 stories	STAY SEQUENTIAL	`@product-owner` + `team-sprint --ralph-mode` is simpler
BMAD quality gates	STAY SEQUENTIAL	Gates are fast (< 30s each); micro-optimization
Small projects (< 50 files)	STAY SEQUENTIAL	Sequential audit completes in ~2 min

Rule of thumb: Use Agent Teams when you have 2+ independent work streams that each take more than 3 minutes.

Fast Mode Guard (Blocking Confirmation)

All /team:* commands include a mandatory blocking guard when Fast Mode (/fast) is active. The leader agent MUST:

Detect if Fast Mode is active (lightning bolt indicator)
Display a comparative dashboard (standard vs fast pricing)
Show a blocking warning with estimated costs for both modes
Wait for explicit user confirmation before proceeding
Abort if the user declines, suggesting /fast to switch back to standard mode

This prevents accidentally running multi-agent operations at 6x cost ($30/M input, $150/M output per agent).

Budget Guard (`--max-cost`)

All /team:* commands support --max-cost=<dollars> to set a maximum budget:

bash

# Abort if estimated parallel cost exceeds $2
/team:audit --max-cost=2.00

# Budget guard with dry-run preview
/team:sprint Sprint-3 --max-cost=5.00 --dry-run

If the estimated parallel cost exceeds the budget:

The leader displays OVER BUDGET: estimated $X.XX > budget $Y.YY
Execution is aborted (workers are NOT launched)
The leader suggests reducing the number of stacks/stories or using --sequential

Cost Analysis

Before launching a team operation, use the cost dashboard to see estimated costs:

bash

# Show cost comparison
Tools/AgentTeams/lib/cost-dashboard.sh --techs 2 --checks 5 --worker-model sonnet

# With Haiku workers (lower cost, slightly more token overhead)
Tools/AgentTeams/lib/cost-dashboard.sh --techs 3 --worker-model haiku

# Dry-run mode (plain text)
Tools/AgentTeams/lib/cost-dashboard.sh --techs 2 --dry-run

Realistic Performance Expectations

Based on empirical analysis (audit-pipeline.md, devils-advocate.md):

Metric	Optimistic	Realistic	Pessimistic
Time speedup	3x	1.5-2.5x	1.2x
Token overhead	+14%	+20-37%	+50%
Dollar cost savings (Haiku workers)	70%	40-60%	20%

Do not expect 5-8x speedup. Amdahl's Law limits practical speedup because ~30% of work is inherently sequential (technology detection, result aggregation, report generation). With realistic coordination overhead, expect 1.5-2.5x speedup for 2-3 technology projects.

Cost reality: When all teammates are simultaneously active, expect approximately ~7x the token cost of a normal single-agent session due to context duplication and coordination overhead across teammates.

Token Overhead Breakdown

Each parallel agent incurs overhead from:

Overhead Source	Per Agent	Notes
System prompt / context loading	~2,000 tokens	Agent loads CLAUDE.md, project rules
Shared context re-reading	~6,000 tokens	Agent reads project files leader already analyzed
Context duplication (model-dependent)	+15-60% of context	Haiku: 60%, Sonnet: 30%, Opus: 15%
Task coordination	~3,000 tokens	TaskCreate, messaging, status checks

Model Pricing Reference

Model	Input ($/M tokens)	Output ($/M tokens)	Best For
Opus 4.8	$5.00	$25.00	Team leader, complex reasoning
Opus 4.6 (fast)	$30.00	$150.00	Fast mode leader (use with caution)
Sonnet 4.6	$3.00	$15.00	Worker agents, balanced cost/quality
Haiku 4.5	$1.00	$5.00	Cost-optimized workers, simple checks

Cost Optimization: Subagent Model Override

Use the CLAUDE_CODE_SUBAGENT_MODEL environment variable to force worker agents to use a cheaper model:

bash

# Force all sub-agents to use Sonnet (saves ~80% vs Opus workers)
export CLAUDE_CODE_SUBAGENT_MODEL=sonnet

# Use Haiku for simple classification tasks
export CLAUDE_CODE_SUBAGENT_MODEL=haiku

This applies globally to all sub-agent spawns within the session.

Team Templates

Claude-craft provides 4 team templates for common parallel workflows.

team-audit: Full Audit Team

Parallelizes the audit across multiple technology stacks.

Architecture:

                    audit-leader (Opus)
                         |
           +-------------+-------------+
           |             |             |
     symfony-auditor  react-auditor  python-auditor
       (Sonnet)        (Sonnet)       (Sonnet)

mermaid

sequenceDiagram
    participant L as audit-leader (Opus)
    participant A1 as symfony-auditor
    participant A2 as react-auditor
    participant A3 as python-auditor

    L->>L: Detect technologies
    L->>L: Cost estimation

    par Fan-Out
        L->>A1: TaskCreate: Audit Symfony
        L->>A2: TaskCreate: Audit React
        L->>A3: TaskCreate: Audit Python
    end

    par Parallel Execution
        A1->>A1: Architecture + Quality + Tests + Security
        A2->>A2: Architecture + Quality + Tests + Security
        A3->>A3: Architecture + Quality + Tests + Security
    end

    A1-->>L: TaskUpdate: completed + result.json
    A2-->>L: TaskUpdate: completed + result.json
    A3-->>L: TaskUpdate: completed + result.json

    Note over L: Barrier: wait for all

    L->>L: Aggregate results
    L->>L: Generate unified report
    L-->>A1: shutdown_request
    L-->>A2: shutdown_request
    L-->>A3: shutdown_request

When to use: Projects with 2+ detected technology stacks.

Agent roles:

Agent	Model	Responsibility
audit-leader	Opus	Detect technologies, spawn auditors, aggregate scores, generate report
{tech}-auditor	Sonnet/Haiku	Run 4 audit categories (architecture, quality, testing, security)

Expected performance (2-tech project):

Metric	Sequential	Parallel
Time	~16 min	~9 min
Tokens	~135K	~181K
Cost (Opus leader + Sonnet workers)	~$4.46	~$2.06

team-sprint: Sprint Development Team

Parallelizes story processing during sprint execution.

When to use: Sprints with 3+ independent stories (no blocking dependencies between them).

mermaid

sequenceDiagram
    participant C as sprint-conductor (Opus)
    participant D1 as dev-worker-1 (Sonnet)
    participant D2 as dev-worker-2 (Sonnet)

    C->>C: Load sprint backlog
    C->>C: Check story independence

    C->>D1: TaskCreate: Implement US-001
    C->>D2: TaskCreate: Implement US-002

    loop Dynamic Queue
        D1->>D1: TDD: Red → Green → Refactor
        D1-->>C: TaskUpdate: completed
        C->>C: Validate DoD
        C->>D1: TaskCreate: Implement US-003

        D2->>D2: TDD: Red → Green → Refactor
        D2-->>C: TaskUpdate: completed
        C->>C: Validate DoD
        C->>D2: TaskCreate: Implement US-004
    end

    Note over C: All stories processed

    C->>C: Generate sprint summary
    C-->>D1: shutdown_request
    C-->>D2: shutdown_request

Agent roles:

Agent	Model	Responsibility
sprint-leader	Opus	Claim stories, assign to workers, track progress
dev-worker-N	Sonnet	Implement a single story with TDD cycle

Constraints:

Maximum 4 agents total (1 leader + 3 workers)
Stories must have status: ready-for-dev and no blocked_by dependencies
Only the sprint-leader writes to sprint-status.yaml (single-writer pattern)

team-security: Security Review Team

Parallelizes security audits across technology stacks with dedicated OWASP checkers.

When to use: Security-critical projects with 2+ technology stacks requiring comprehensive OWASP review.

mermaid

sequenceDiagram
    participant L as security-lead (Opus)
    participant CR as Code Reviewer
    participant DA as Deps Auditor
    participant IR as Infra Reviewer

    L->>L: Reconnaissance + threat model

    par 3-Way Fan-Out
        L->>CR: TaskCreate: Code vulnerabilities
        L->>DA: TaskCreate: Dependency audit
        L->>IR: TaskCreate: Infra review
    end

    par Parallel Analysis
        CR->>CR: OWASP checks, injection, XSS
        DA->>DA: CVE scan, license compliance
        IR->>IR: Docker, secrets, config
    end

    CR-->>L: findings[]
    DA-->>L: findings[]
    IR-->>L: findings[]

    Note over L: Barrier: wait for all

    L->>L: Cross-reference findings
    L->>L: Build attack chains
    L->>L: Prioritize by severity × exploitability
    L->>L: Generate remediation plan

    L-->>CR: shutdown_request
    L-->>DA: shutdown_request
    L-->>IR: shutdown_request

Agent roles:

Agent	Model	Responsibility
security-leader	Opus	Coordinate security review, generate consolidated report
{tech}-security-auditor	Sonnet	Run per-stack security checks, dependency audits

team-delivery: Delivery Team (Full Lifecycle)

Orchestrates the complete sprint cycle: Phase 1 writes EPICs/US/tasks with cross-review, Phase 2 implements them in parallel using the file domain map produced in Phase 1. A single Delivery Lead (opus) orchestrates both phases, preserving full context.

Architecture:

Phase 1 (Writing):                    Phase 2 (Implementation):

  delivery-lead (Opus)                  delivery-lead (Opus) — same agent
       |                                     |
  +----+----+----+                      +----+----+----+
  |    |    |    |                      |    |    |    |
Writer Reviewer Architect           dev-1  dev-2  dev-3
(Sonnet)(Haiku) (Sonnet)           (Sonnet)(Sonnet)(Sonnet)

  ~~~ shutdown Phase 1 workers → spawn Phase 2 workers ~~~

mermaid

sequenceDiagram
    participant L as delivery-lead (Opus)
    participant W as Writer (Sonnet)
    participant R as Reviewer (Haiku)
    participant A as Architect (Sonnet)
    participant D1 as dev-1 (Sonnet)
    participant D2 as dev-2 (Sonnet)

    rect rgb(230, 245, 255)
        Note over L,A: Phase 1: Writing
        L->>W: Write EPIC + US
        W-->>L: artifacts
        L->>R: Validate quality (INVEST 6/6)
        R-->>L: approved / rejected
        L->>A: Validate tech + file domains
        A-->>L: file domain map

        alt Rejected
            L->>W: Rewrite with feedback (max 2x)
        end
    end

    L->>L: Write phase-handoff.yaml
    L-->>W: shutdown_request
    L-->>R: shutdown_request
    L-->>A: shutdown_request

    rect rgb(230, 255, 230)
        Note over L,D2: Phase 2: Implementation
        L->>L: Load waves from handoff

        par Wave 1 (independent stories)
            L->>D1: TaskCreate: Implement US-001
            L->>D2: TaskCreate: Implement US-002
        end

        D1-->>L: completed
        D2-->>L: completed

        Note over L: Wave 2 (dependent stories)
        L->>D1: TaskCreate: Implement US-003
        D1-->>L: completed
    end

    L->>L: Generate delivery report
    L-->>D1: shutdown_request
    L-->>D2: shutdown_request

When to use: Full sprint cycle (writing + implementation) with 3+ stories to write AND implement.

Agent roles:

Agent	Phase	Model	Responsibility
delivery-lead	Both	Opus	Orchestrate pipeline, validate gates, assign work
writer	1	Sonnet	Create EPICs, US (INVEST+3C+Gherkin), tasks
reviewer	1	Haiku	Validate quality (INVEST 6/6, AC coverage, slicing) — classification task, haiku suffices
architect	1	Sonnet	Validate tech feasibility, produce file domain map
dev-worker-N	2	Sonnet	Implement a story with TDD cycle

Key differentiators vs team-sprint:

Feature	team-delivery	team-sprint
Story writing	Built-in (Phase 1)	Requires pre-written stories
File domain map	Computed by Architect	Heuristic at runtime
Cross-review	Writer → Reviewer → Architect	None
BMAD gates	PRD + Backlog + Sprint Ready + DoD	DoD only
Parallelization waves	Pre-computed from domain map	Ad-hoc independence check

Expected performance (5 stories):

Metric	Sequential	Team Delivery
Time	~120 min	~55 min
Tokens	~850K	~1,125K
Cost (Opus lead + Sonnet workers)	~$28	~$17

Constraints:

Maximum 5 agents total (1 lead + 3 writers OR 3 dev workers per phase)
Phase transition takes ~30s (shutdown + respawn)
Only the delivery-lead writes to sprint-status.yaml (single-writer pattern)
Stories with file domain overlap are sequenced into waves (not parallelized)

Known Limitations

Research Preview Status

Agent Teams is a Research Preview feature (v2.1.32+). This means:

Limitation	Impact	Mitigation
API may change without notice	Templates may break on update	Abstraction layer isolates claude-craft from API changes
Maximum ~4 agents recommended	Large teams have quadratic coordination cost	Cap teams at 1 leader + 3 workers
No force-kill for agents	Stuck agent blocks pipeline	Timeout watchdog with graceful degradation
Cooperative shutdown only	Teammates can reject shutdown requests	Design workflows that complete naturally
Memory leak in long sessions	Agent teams consume growing memory	Fixed in v2.1.50+; update Claude Code
No bulk kill UI	Must kill agents one by one	Ctrl+F bulk agent kill available in v2.1.53+

Context Compaction Risk (#23620)

Claude Code's context compaction feature can cause the team leader to lose awareness of team state after compaction. All /team:* commands include mitigation instructions:

Periodic re-read: The leader re-reads TaskList every 5 worker completions to refresh team awareness
Inactivity detection: If no status updates for >3 minutes, force a full TaskList re-read
Phase transition recovery (delivery): The leader re-reads phase-handoff.yaml at the start of Phase 2

Synchronization Barrier Cadence

Team leaders use an adaptive polling cadence for the synchronization barrier:

Phase	Poll Interval	Condition
Active	30 seconds	Workers are updating tasks
Idle	60 seconds	After 3 consecutive polls without status change
Hook-based	Event-driven	Using `TeammateIdle`/`TaskCompleted` hooks (v2.1.33+)

Per-Task Timeout Chain

Each task type has a specific timeout based on 1.5x the estimated duration:

Task Type	Estimated Duration	Timeout
Audit (per stack)	1.5 min	2.25 min
Sprint (per story)	15 min	22.5 min
Security (per dimension)	2 min	3 min
Delivery (per story)	20 min	30 min

When a worker exceeds its timeout, the leader marks the task as failed and continues with partial results.

No `kill -9` for Agents

Unlike bash subprocesses, there is no way to forcibly terminate a misbehaving agent. If an agent enters an infinite loop or hangs:

Send a shutdown_request via SendMessage
If the agent rejects or does not respond, the leader must wait for the agent's context window to fill
The timeout watchdog (if configured) will flag the agent as unresponsive and continue without it

Single Chrome Browser for Recette

QA Recette (/qa:recette) uses a single Chrome browser instance. Parallel test execution is not possible -- only test planning can be parallelized. This is a hard constraint of the Chrome integration architecture.

Sprint-Status YAML Concurrency

Multiple agents must never write to sprint-status.yaml simultaneously. The single-writer pattern ensures only the team leader updates shared state. Workers report results via task metadata (TaskUpdate), and the leader applies changes sequentially.

Troubleshooting

Agent Fails to Start

Symptom: Agent spawning times out or fails silently.

Causes:

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 not set
Claude Code version < 2.1.32
Too many agents already running (resource limits)

Fix:

bash

# Verify environment
echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS  # Should output: 1
claude --version                             # Should be >= 2.1.32

High Token Usage

Symptom: Parallel execution costs significantly more tokens than expected.

Causes:

Each agent loads full project context independently
Worker model retries (especially Haiku on complex tasks)
Large CLAUDE.md or reference files duplicated per agent

Fix:

Use the cost dashboard before launching: Tools/AgentTeams/lib/cost-dashboard.sh
Switch to Sonnet workers if Haiku produces too many retries
For projects with very large CLAUDE.md, consider reducing reference file size

Agent Appears Stuck

Symptom: No TaskCompleted events after extended time.

Causes:

Agent hit context window limit
Docker command hanging inside agent
Agent waiting for input that will never arrive

Fix:

Check if the agent's task is still in_progress via TaskList
Send a message to the agent asking for status
If unresponsive, the leader should mark the task as failed and fall back to sequential execution

Result Aggregation Errors

Symptom: Final report has missing or inconsistent scores.

Causes:

Worker agent crashed before reporting results
Task metadata not properly set by worker

Fix:

Check each worker's task status and metadata
Re-run failed technology audit sequentially in the leader context
The aggregation protocol handles partial results by marking missing categories as "SKIPPED"

Architecture Decisions

Why 4 Templates

The devil's advocate analysis (devils-advocate.md Section 3.1) identified that 9+ templates create unmaintained bloat. Four templates (audit, sprint, security, delivery) cover 90% of parallel use cases. The delivery template was added because the full sprint lifecycle (writing + implementation) is a distinct workflow that benefits from cross-phase context preservation and file domain mapping — capabilities not achievable by combining team-sprint with sequential writing.

Why Cap at 4 Agents

Teams larger than 4 agents experience quadratic coordination cost (Section 3.2). The team leader spends more time coordinating than any individual worker spends working. Empirically, 1 leader + 2-3 workers provides the best cost/benefit ratio.

Why an Abstraction Layer

The Agent Teams API is a Research Preview. An abstraction layer (ralph-teams-adapter.sh) isolates claude-craft from API changes, allowing the framework to switch between Agent Teams and bash subprocesses without modifying templates or workflows.

Cost Tools Reference

cost-estimator.sh

Calculates raw token and cost estimates for sequential vs parallel execution.

bash

Tools/AgentTeams/lib/cost-estimator.sh --techs 2 --checks 5 --worker-model sonnet

Options:

Flag	Default	Description
`--techs N`	2	Number of technology stacks
`--checks N`	5	Checks per technology
`--worker-model M`	sonnet	Worker model: haiku, sonnet, opus
`--leader-model M`	opus	Leader model: haiku, sonnet, opus
`--tokens-per-check N`	12500	Override per-check token estimate
`--task-type T`	audit	Task type: audit, sprint, security, delivery (affects context tokens)
`--fast-mode`	false	Use Fast Mode pricing (6x cost for Opus)
`--max-cost N`	-	Maximum budget in dollars; output includes `WITHIN_BUDGET`
`--auto-size`	false	Output `RECOMMENDED_WORKERS` based on task type
`--dry-run`	false	Show config without calculating
`--help`	-	Show usage

Output: Key=value pairs (machine-readable) for consumption by other scripts. Includes FAST_MODE_WARNING, WITHIN_BUDGET, and RECOMMENDED_WORKERS fields when relevant flags are used.

cost-dashboard.sh

Displays a visual comparison table before launching a team operation.

bash

Tools/AgentTeams/lib/cost-dashboard.sh --techs 3 --worker-model haiku

Options: Same as cost-estimator.sh, plus --width N for display width and --max-cost N for budget guard display.

Recommendation logic: Uses parallel only if time_saved > 30% AND extra_cost < 50%.

Budget guard display: When --max-cost is provided, the dashboard shows Within budget: YES or Within budget: NO - OVER BUDGET alongside the cost estimates.

Agent Teams Guide ​

Prerequisites ​

Installation ​

When to Use Agent Teams ​

Fast Mode Guard (Blocking Confirmation) ​

Budget Guard (--max-cost) ​

Cost Analysis ​

Realistic Performance Expectations ​

Token Overhead Breakdown ​

Model Pricing Reference ​

Cost Optimization: Subagent Model Override ​

Team Templates ​

team-audit: Full Audit Team ​

team-sprint: Sprint Development Team ​

team-security: Security Review Team ​

team-delivery: Delivery Team (Full Lifecycle) ​

Known Limitations ​

Research Preview Status ​

Context Compaction Risk (#23620) ​

Synchronization Barrier Cadence ​

Per-Task Timeout Chain ​

No kill -9 for Agents ​

Single Chrome Browser for Recette ​

Sprint-Status YAML Concurrency ​

Troubleshooting ​

Agent Fails to Start ​

High Token Usage ​

Agent Appears Stuck ​

Result Aggregation Errors ​

Architecture Decisions ​

Why 4 Templates ​

Why Cap at 4 Agents ​

Why an Abstraction Layer ​

Cost Tools Reference ​

cost-estimator.sh ​

cost-dashboard.sh ​

Further Reading ​

Agent Teams Guide

Prerequisites

Installation

When to Use Agent Teams

Fast Mode Guard (Blocking Confirmation)

Budget Guard (`--max-cost`)

Cost Analysis

Realistic Performance Expectations

Token Overhead Breakdown

Model Pricing Reference

Cost Optimization: Subagent Model Override

Team Templates

team-audit: Full Audit Team

team-sprint: Sprint Development Team

team-security: Security Review Team

team-delivery: Delivery Team (Full Lifecycle)

Known Limitations

Research Preview Status

Context Compaction Risk (#23620)

Synchronization Barrier Cadence

Per-Task Timeout Chain

No `kill -9` for Agents

Single Chrome Browser for Recette

Sprint-Status YAML Concurrency

Troubleshooting

Agent Fails to Start

High Token Usage

Agent Appears Stuck

Result Aggregation Errors

Architecture Decisions

Why 4 Templates

Why Cap at 4 Agents

Why an Abstraction Layer

Cost Tools Reference

cost-estimator.sh

cost-dashboard.sh

Further Reading