Spec:
specs/093-agent-collaboration-supervision/β seefeature.yaml,research.yaml, andplan.yaml.
This document describes the collaboration & supervision fabric that lets Shep agents talk to each other while they work, lets a delegated supervisor agent monitor and intervene on the userβs behalf, and surfaces every agent question (interactive or background) in a single unified inbox.
The whole surface is gated behind the collaboration feature flag.
With the flag off, behavior is byte-identical to a vanilla Shep install.
The user question that motivated this feature was simple:
These map onto three intertwined capabilities sharing one infrastructure:
| Capability | Domain entity | Port | Storage |
|---|---|---|---|
| Agent-to-agent messaging | AgentMessage |
IAgentMessageBus |
agent_messages |
| Unified question pipeline | AgentQuestion |
IAgentQuestionService |
agent_questions |
| Delegated supervisor agent | SupervisorPolicy, SupervisorDecision |
ISupervisorAgent |
supervisor_policies, supervisor_decisions |
All three follow Shepβs standard pipeline: TypeSpec model β SQLite migration β repository β application port β use case β infrastructure adapter β SSE event kind β CLI + Web surface.
Inter-agent traffic is hub-and-spoke, not peer-to-peer. The bus rejects peer addressing in v1; messages target one of:
broadcast (per app/feature)supervisoruseragentRunId only as a reply (matched via correlationId)When a SupervisorPolicy exists for the app/feature scope, the supervisor
evaluates routed messages and may emit follow-on decisions. When no policy is
configured, a NullSupervisor records the message in activity_log but
performs no policy evaluation.
This matches the team-execution
protocol draft (the EM hub) and
keeps audit centralized. Reply round-trips are direct via correlationId so
high-frequency exchanges donβt bottleneck on the hub.
ββββββββββββ ββββββββββββ ββββββββββββ
β Agent A β β Agent B β β Agent C β
β (run-1) β β (run-2) β β (run-3) β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
β publish/listen β publish/listen β
ββββββββββββ¬ββββββββββ΄βββββββββββ¬ββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββ
β IAgentMessageBus (SQLite-backed) β
β agent_messages table, WAL, polled β
ββββββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββ΄βββββββββββββ
βΌ βΌ
ββββββββββββββββββ βββββββββββββββββββ
β ISupervisorAgentβ β SSE event streamβ
β (policy hub) β β (UI / CLI) β
ββββββββββββββββββ βββββββββββββββββββ
The bus is cross-process by virtue of shared SQLite β every worktree
process opens the same ~/.shep/<repo-hash>/shep.db in WAL mode, so two
parallel feature agents in different worktrees coordinate by reading and
writing agent_messages without any new IPC primitive.
Every SupervisorPolicy carries an autonomyLevel that controls how much
authority the supervisor exercises:
| Level | Supervisor can | User must |
|---|---|---|
advisory (default) |
recommend (advise / escalate) |
always act on every gate |
cosign |
recommend or pre-approve (approve / reject) |
also approve before the gate passes |
autonomous |
close the gate directly (approve / reject) via the existing approve/reject use cases |
nothing (but can override at any time) |
autonomyLevel is the per-policy default. Per-gate overrides live in
gateAuthorityJson (prd / plan / merge β autonomy override) so a user
can run the supervisor in advisory mode for PRD review but autonomous for
merges, all in the same policy.
advisory is the default for a brand-new install, because promoting the
supervisor to autonomous-by-default would silently change the trust contract
that todayβs approval gates establish.
When the supervisor and the user disagree on a gate, the userβs decision is final. The supervisorβs vote is recorded for audit but cannot override.
The invariant is enforced inside ApproveAgentRunUseCase and
RejectAgentRunUseCase: when the actor namespace is supervisor:<id>, the use
case looks up any prior user decision on the same gate; if one exists, the
supervisorβs call is rejected with the rationale stored in the decision row.
This pattern keeps the existing waiting_approval state machine as the
single source of truth β the supervisor acts AS an actor inside it, not
alongside it. No new pause primitive was invented.
sequenceDiagram
autonumber
participant FA as Feature Agent
participant Q as AgentQuestion (gate mirror)
participant S as Supervisor (LangGraph)
participant DB as activity_log + supervisor_decisions
participant N as INotificationService
participant U as User
FA->>FA: hits approval gate (status = waiting_approval)
FA->>Q: emit AgentQuestion (kind = blocking)
FA->>S: EvaluateSupervisorDecisionUseCase(input)
S->>S: load SupervisorPolicy (feature β app fallback)
alt policy missing or disabled
S-->>N: (no-op β fall through to user)
N->>U: WaitingApproval notification (existing flow)
else autonomy = advisory
S->>DB: write SupervisorDecision (advise / escalate)
N->>U: AgentQuestionBlocking notification + advice
U->>FA: ApproveAgentRunUseCase(actor = user:<id>)
else autonomy = cosign
S->>DB: write SupervisorDecision (approve)
N->>U: AgentQuestionBlocking notification ("supervisor co-signed")
U->>FA: ApproveAgentRunUseCase(actor = user:<id>)
FA->>FA: gate closes only after BOTH supervisor & user approve
else autonomy = autonomous
S->>DB: write SupervisorDecision (approve / reject)
S->>FA: ApproveAgentRunUseCase(actor = supervisor:<id>)
N-->>U: AgentQuestionPending (informational, no block)
U-->>FA: optional override β UserAlwaysWins guard wins
end
FA->>FA: resume / abort based on gate outcome
Key properties:
waiting_approval β Approve / Reject flow. The supervisor is just a new
actor namespace.SupervisorDecision { verdict: 'escalate' } and a SupervisorFailed
notification β the human path proceeds immediately (FR-22).RejectAgentRunUseCase(actor = user:<id>) at any point; the prior-user-
decision guard then refuses any subsequent supervisor action on the same
gate.AgentQuestion is the single surface for every agent-to-human ask, no
matter which execution mode raised it.
Two write paths converge here:
canUseTool interception
in
claude-code-interactive-executor.service.ts
already excludes AskUserQuestion from auto-allowed tools, so every
invocation hits the callback. The callback now calls
AskAgentQuestionUseCase and awaits a Deferred registered in an
in-process DeferredQuestionRegistry. When AnswerAgentQuestionUseCase
resolves the row, the registry resolves the Promise and the SDK callback
returns to the agent.feature-agent-worker.ts
transitions a run to waiting_approval, it emits a parallel
AgentQuestion of kind = blocking so the same gate appears in the
unified inbox alongside interactive questions.Both paths converge on AnswerAgentQuestionUseCase:
Deferred so the SDK callback
returns.ApproveAgentRunUseCase /
RejectAgentRunUseCase with the appropriate actor.Every AgentQuestion has a kind:
| Kind | UX |
|---|---|
info |
streams to the per-feature activity feed only β no notification |
question |
queued in the inbox; notification fires at user-controlled urgency; agent may auto-resolve to defaultAnswer after expiresAt |
blocking |
always raises a notification within β€ 2s (NFR-10); the agent is paused until answered or cancelled |
The three tiers map cleanly onto the spec-014 vocabulary
([TASK-READY]/[BLOCKED]/[USER-UPDATE]).
Delegated authority without explainability is the fastest way to lose user trust, so every supervisor decision stores a full rationale.
A SupervisorDecision row carries:
verdict β approve |
reject |
escalate |
advise |
rationaleText β free-form prose written by the evaluatormodelId β the LLM the evaluator ran on (e.g. claude-sonnet-4)promptVersion β version stamp on the evaluator promptruleRef (optional) β the policy rule that firedconfidence (optional) β 0β1 self-reported confidencesourceEventKind / sourceEventId β what triggered the evaluationsupervisorRunId β the supervisorβs own agent_runs rowEach row is mirrored into activity_log (the immutable audit table from
migration 064) with actor_id = "supervisor:<id>", so the supervisorβs
decisions appear next to user actions in the same chronological feed and can
never be silently rewritten.
In the web UI, the βWhy?β drawer
(supervisor-decision-why-drawer.tsx)
opens on every gate or question that has a decision attached. It renders the
full chronological audit (verdict + rationale + model/prompt versions) so the
user can inspect why the supervisor did what it did, even months later after
the model has rotated.
SupervisorPolicy is keyed by (appId, featureId NULLABLE) and resolves
feature-first, then app-fallback. A user can:
/application/<id>/supervisor), and/application/<id>/supervisor?feature=<featureId>).This matches every existing scoping decision in Shep (Settings, ApprovalGates, AgentDefinition all cascade app β feature) so there is one mental model for all configuration.
The CLI has parity with web:
shep supervisor configure β write/update a policyshep supervisor enable / shep supervisor disable β flip the toggleshep supervisor status β show resolved policyshep supervisor approve / shep supervisor reject β drive a gate from a
scripted/cron contextThe web surface lives at:
/application/[id]/supervisor β config form (autonomy, model, prompt
version, per-gate authority)/agent-questions β unified inbox across all appsBoth consume the same use cases (ConfigureSupervisorUseCase,
GetSupervisorPolicyUseCase, ListAgentQuestionsUseCase,
AnswerAgentQuestionUseCase, etc.). No business logic lives in CLI or Web β
they are thin adapters over the use-case API, per
.claude/rules/code-quality.md.
Supervisor escalations and pending questions reach the user through the
existing INotificationService. The collaboration fabric adds five new
NotificationEventType values (defined in
tsp/common/enums/notification.tsp):
agent_question_pendingagent_question_blockingagent_message_blockedsupervisor_escalatedsupervisor_failedEach kind is exposed as a boolean in Settings.notifications.events, so users
can mute any of them per channel (in-app / desktop / browser). No parallel
delivery channel was built β the existing notification surface is the single
mental model for every agent-driven alert.
Three new event kinds are streamed through the existing
StreamAgentEventsUseCase (default 2s poll, opt-in 500ms for
blocking-priority subscriptions):
agent_messageagent_questionsupervisor_decisionEach is computed by a dedicated helper (computeMessageDeltas,
computeQuestionDeltas, computeDecisionDeltas) that mirrors the existing
computeFeatureDeltas / computePrDeltas / computeStatusDeltas shape, so
the web Service Worker fans them out to every tab through the same multiplex
without any transport changes.
The whole surface lives behind FeatureFlags.collaboration:
FeatureFlags in
tsp/domain/entities/settings.tsp.feature_flag_collaboration on settings
(migration 091).NEXT_PUBLIC_FLAG_COLLABORATION (DB-primary; env is fallback).With the flag off:
{ enabled: false }).This satisfies NFR-14 β byte-identical default behavior.
| Layer | Path |
|---|---|
| TypeSpec models | tsp/agents/agent-message.tsp, agent-question.tsp, supervisor-policy.tsp, supervisor-decision.tsp |
| Generated types | packages/core/src/domain/generated/output.ts |
| Value objects | packages/core/src/domain/value-objects/supervisor-actor.ts |
| Output ports | packages/core/src/application/ports/output/agents/ (agent-message-bus, agent-question-service, supervisor-agent) |
| Repository ports | packages/core/src/application/ports/output/repositories/ (agent-message-repository, agent-question-repository, supervisor-policy-repository, supervisor-decision-repository) |
| Use cases | packages/core/src/application/use-cases/agents/ (send-agent-message, ask-agent-question, answer-agent-question, cancel-agent-question, list-agent-questions, escalate-to-user, configure-supervisor, enable-supervisor, disable-supervisor, get-supervisor-policy, evaluate-supervisor-decision) |
| SSE deltas | packages/core/src/application/use-cases/agents/stream-agent-events/compute-{message,question,decision}-deltas.ts |
| Supervisor agent | packages/core/src/infrastructure/services/agents/supervisor-agent/ (supervisor-graph.ts, supervisor-agent-worker.ts, evaluator-prompt.ts, stub-supervisor-executor.ts) |
| Approval-gate hooks | packages/core/src/application/use-cases/agents/approve-agent-run.use-case.ts, reject-agent-run.use-case.ts |
| SQLite migrations | 087-create-agent-messages.ts, 088-create-agent-questions.ts, 089-create-supervisor-policies.ts, 090-create-supervisor-decisions.ts, 091-add-feature-flag-collaboration.ts |
| CLI commands | src/presentation/cli/commands/supervisor/, src/presentation/cli/commands/agent/{message,questions}/ |
| Web routes | src/presentation/web/app/application/[id]/supervisor/page.tsx, src/presentation/web/app/agent-questions/page.tsx |
| Web components | src/presentation/web/components/{supervisor,agent-questions,agent-activity}/ |