Skip to main content

Multi-Prism Controller

The Multi-Prism Controller is the coordination layer that turns Prism from a single-operator memory substrate into a multi-agent, multi-operator control plane. It is specified in SPEC-030 and implements the architectural commitments from the Prism architecture paper: MCP at the boundary, gRPC bidirectional streams inside, no peer-to-peer control paths, capability-scoped tokens, and lease-based resource claiming. This is not a separate service to deploy. The controller is a role that one MCP server instance claims per project, using a master election model inspired by NetBIOS master browser election. The same code runs in every MCP server — the difference is a runtime flag. The identity scope a controller operates within is the four-level hierarchy defined by SPEC-093: Tenant → Org → Department → Project. Every project carries tenant_id + org_id + department_id + pid with FK constraints enforcing the chain at insertion. Personal-install (LAN) backwards compatibility is preserved through flat-LAN seeding — when no Department is specified, a default Department is created under the default Org under the default Tenant — so existing single-user installs continue to work without operator action.

The master election model

Every MCP server instance on a project participates in the controller election. The election happens inside prism_start as a side effect of registration — no new verb, no configuration. Claude Desktop always wins when present. CD is the operator console — the human-facing surface where the operator steers. It gets deterministic election priority the same way a domain controller gets priority in a NetBIOS master browser election. If CD is not running, the first agent to call prism_start on that project becomes master. The master opens a gRPC bidirectional stream to the Prism backend. Through this stream it receives real-time push events: lease contention alerts, approval requests, state-change notifications. Non-master instances (peers) continue using normal HTTP for verb calls and pick up controller obligations asynchronously through the nudge table at their next prism_start.
Two operator consoles (User A and User B), each driving multiple execution clients (Claude and Codex), all coordinating on one shared project through the Prism control plane. The central plane holds identity and project scope, task graph and policy engine, shared canonical memory, approval and merge gates, and audit log. A dashed barrier between the two operator stacks is labeled 'No peer-to-peer control path' — every interaction flows through Prism.
The master does not relay messages to peers. The backend is the only router. All authority flows through Prism — the master is the preferred recipient for real-time events, not the authority over them. The brokered-authority principle holds at every layer. The election itself is a small state machine that fires inside prism_start: Single-master invariant is enforced at the database level (partial UNIQUE on tenant_id, project_id, is_master=true), so no application-level race can produce two masters. CD’s priority is implemented via a Lua CAS script in Redis (SPEC-032 §5.2.1) that demotes a non-CD incumbent atomically.

The controller registration table

Every agent on every project registers in a Postgres-backed routing table when it calls prism_start. The table tracks who is active, who is master, what surface they are running on, and when they last sent a heartbeat. A partial UNIQUE constraint enforces single-master per project at the database level — no application-layer race is possible. Released registrations are never deleted. They get a released_at timestamp, preserving full session history on disk. This follows the same append-never-delete pattern used throughout Prism for specs, ADRs, and leases. You can inspect the routing table at any time:
prism_status
This returns who is master, who is registered as a peer, what editor surfaces they are on, and whether the master’s gRPC stream is active. Think of it as net view for your agent network.

Context switching — prism_start gets smarter

When you call prism_start on a different project than the one you are currently registered on, the system detects the context switch automatically. Behind the scenes it soft-closes the old project (writing a wrap-session nudge so no work is silently lost), releases your registration on the old project, and registers you on the new one — running the master election for the new project in the same step. From the user’s perspective: you say prism_start on a different project. Everything else is invisible. No prism_switch verb, no manual deregistration, no master re-election commands. The system figures out what needs to happen.

Operator-driven master changes

The election state machine above handles the common case — first session wins, Claude Desktop wins on arrival, peers register quietly. SPEC-082 v0.3 adds three operator-facing verbs that turn the remaining cases — “I want a specific identity to be master right now,” “another agent’s session is stale and needs to be removed” — into explicit, idempotent operations rather than restart races.
VerbCallerPurpose
prism_master_handoff(pid, to_identity, to_session_id?)Current masterCooperative transfer. Caller MUST be current master.
prism_master_claim(pid, to_identity, operator_id, operator_password, to_session_id?)Anyone with operator credentialsOperator-authorized preempt. Caller proves authority via SPEC-038 §3.2 credentials; the target identity is independent of the caller.
prism_session_deregister(session_id)AnyoneSurgical cleanup of a single controller row. Idempotent; releases both the Postgres row and the Redis hash + sessions-set entry.
Both prism_master_handoff and prism_master_claim resolve their target through a deterministic 6-step algorithm:
  1. If to_session_id is provided, that exact controller row is selected (must match pid + identity, must be active). Authoritative override.
  2. Otherwise the candidate set is active rows for to_identity whose heartbeat is within the freshness threshold (default 30s).
  3. An empty candidate set returns target_not_registered.
  4. A non-empty set with all rows heartbeat-aged-out returns target_stale.
  5. Exactly one fresh candidate is selected.
  6. Multiple fresh candidates returns target_ambiguous, listing the candidate session_ids. The operator disambiguates by re-calling with to_session_id.
State updates are Redis-first per SPEC-032 — the Session Manager facade flips the master slot atomically through a Lua CAS, then the Postgres controller_status rows are reconciled. prism_status reads from Redis, so it reflects the new master immediately on return. Concurrent calls are linearised by the CAS — the loser receives stale_master and can retry against the new state. Both verbs reuse the existing MasterPreempted system signal — there is no MasterChanged enum. The payload extension carries previous_master_*, new_master_*, reason ("preempt" for prism_master_claim or election-driven preempt, "handoff" for prism_master_handoff), and by_operator=<operator_id> on the operator-authorized path. See Signal Mesh — System Signals for the full payload contract. prism_session_deregister is the cleanup lever for stale rows that don’t represent a live session — typically because a peer crashed before its session manager could release the Redis state, or because the row predates the controller-row leak fix in PR #150. It does not change master state on its own; if the deregistered session happened to hold master, election re-runs on the next prism_start.

Consensus-first parallelism

Master election handles “one agent holds the gRPC stream”; leases handle “one agent at a time on a resource.” The middle layer — “how do many agents drive a single multi-step decision without serialising into one chat thread?” — is what SPEC-078 v0.2 codifies as consensus-first parallelism. It’s the orchestration discipline the eight-agent mesh runs on, and it’s what made Plan #10 ratify six governance artefacts in 3.5 hours instead of taking days. The mechanism is a 3-tier consensus workflow tuned to the architectural risk of the work in flight:
Risk tierTriggerWorkflow
LowMechanical change inside an existing pattern (rename, dependency bump, doc fix)Single-agent execute; PR review by lane owner. No consensus needed.
MediumNew scope inside an established lane (feature in shipped surface, schema migration on existing tables)Driver routes to a designated reviewer. ReviewRequestedReviewCompleted with structured findings. Single review pass unless findings are blocking.
High / architecturalCross-lane scope, new authority surface, governance artefact, ratification arcMulti-pass review chain: driver → architect for technical adequacy → governance/methodology for risk-tier authority and supersession → PO ratification. Each pass returns ReviewCompleted with a structured verdict (approved, approved_with_minor_nit, findings_block_ratification). Driver folds amendments into the next version, re-routes for confirmation, then ratifies.
The 3-tier workflow + the typed signal vocabulary + method fragments (see Hybrid RAG — Beyond the four legs) compose into a parallel-execution discipline that doesn’t require a master-of-masters. Multiple agents can drive different scopes in parallel — each as a low / medium / high arc — and the brokered-authority model keeps them from colliding because every cross-scope interaction is a typed signal with a persisted publish path. Two seed method fragments shipped with SPEC-078 v0.2 codify the orchestration invariants:
  • method.completion.done-definition — “completion” means merged + deployed + tested, not merged alone. Reviewers and ratifiers gate on this definition rather than re-litigating it per arc.
  • method.parallel.ownership-contract — explicit write ownership, single-driver-per-domain, signal-mediated handoffs. The contract that makes “no peer-to-peer command authority” work in practice.
Plan #10 is the worked example. Three waves of governance artefacts (ADR #47 + SPEC-077 + ADR #48 in Wave 1; SPEC-078 v0.2 + SPEC-079 v0.2 in Wave 2; SPEC-080 v0.2 in Wave 3) ratified through the high-tier workflow. The architect’s review chains ran nine review cycles at approximately ten minutes each without becoming a bottleneck — the structured signal payload carried full review context across the chain instead of scattering it across chat messages and email threads. See Plan #10 case study for the full timeline.

Leases and capability tokens

When multiple agents work on the same project, they need to claim resources without colliding. The controller mediates this through leases and capability tokens. A lease is a time-bounded claim on a resource — an entity, a file scope, a worktree. Only one agent can hold a lease on a given resource at a time, enforced by a partial UNIQUE constraint in Postgres. Leases expire automatically if not renewed. If a second agent requests a lease that is already held, the controller pushes a contention event to the master, which presents the conflict to the human for resolution. A capability token scopes what an agent is authorized to do. Tokens are short-lived (minutes), opaque, and stored in Postgres. An agent refreshes its token via the gRPC stream. The token’s scope defines which resource types and actions the agent can perform — “can read and write specs” or “can read ADRs but not modify them.” The controller validates the token on every mutation before allowing it through to the intent queue.

Approval flow

Some mutations require human approval before they proceed. The controller handles this through two paths: When Claude Desktop is master (the common case), approval requests arrive in real time via the gRPC stream. CD presents the request inline — “Agent Donna wants to update SPEC-020 status to superseded. Approve?” — and the human responds. The approval flows back through the stream and the mutation proceeds. When CD is not running, the approval falls back to the nudge table. The next prism_start on any surface shows the pending approval, and the human resolves it there. Slower, but functional. The real-time path exists for speed; the async path exists for correctness.

How the backend routes events

The backend is the only router in the system. When a state mutation lands — through the existing intent queue from SPEC-026 — the backend knows every registered agent on that project from the controller registration table. It routes the event through two delivery surfaces simultaneously: The master’s gRPC stream gets a real-time push. Every registered peer gets a nudge row written to the nudge table (SPEC-029). One source of truth (Postgres), two delivery mechanisms (stream for speed, nudges for reliability), zero peer-to-peer relay. Master gets the fast path (push). Peers get the durable path (pull on next verb). If the master drops offline mid-stream, its next prism_start drains the nudges it missed — so real-time push is never the only delivery path. Zero lost events without a circular peer-to-peer mesh. If the master dies (editor closes, machine sleeps), the backend detects the stream disconnect via gRPC keepalive. It marks the registration as released. Events continue accumulating in the nudge table. When the next agent calls prism_start, it re-elects a master, the new master opens a fresh stream, and catches up from the nudge table.

Scale and deployment

The controller election runs on a single Postgres instance. No distributed consensus, no etcd, no Raft. A Postgres row with a partial UNIQUE constraint IS the election. This handles the target scale comfortably: fewer than 10 users with fewer than 10 agents each, per project. Deployment adds one new container to the existing docker-compose stack — backend-grpc, running from the same backend image with a different entrypoint. It shares the same database URL, the same Neo4j driver, and has zero shared in-memory state with the HTTP backend. The MCP server configuration in each editor does not change. If Prism ever needs to go beyond LAN scale, the protocol stays the same. The Postgres instance moves to a hosted service. No protocol change needed — just an infrastructure swap.

What this means for existing users

If you are running Prism today as a single operator with one agent, nothing changes. prism_start still works exactly as before. You gain prism_status as a new verb, but you will never need it until you add a second agent. The controller registration happens silently — you become master of your own project with no visible difference in behavior. The controller becomes meaningful when you add a second agent. The moment Donna joins a project that Lola is already working on, the registration table shows two entries, one is master, and the coordination infrastructure activates. No configuration, no setup, no new processes to launch.

Where to go next

Vision

The thesis — where the market is going and why Prism’s architecture is ahead of it.

Tri-Graph Architecture

The knowledge representation layer that the controller governs.

Installation

Get Prism running — the controller activates automatically.
Last modified on May 22, 2026