Status:
complete · Version 1 · Filed 2026-04-23Goal
Ship the Postgres-backed controller registration table, election logic embedded inprism_start / prism_wrap, and the new prism_status verb. No gRPC, no streams, no leases, no capability tokens — those are Phases 2–6.
Immediately useful to Frank on land: prism_status shows live agent topology per project (who’s master, who’s peer, who’s stale). Lola + Donna dogfood the election from day one.
Scope — in
- Migration
01X_spec030_phase1_controller_registrations.py:controller_registrationstable + three indexes + theagent_surfaceCHECK constraint. - SQLAlchemy model + Pydantic schemas for registration.
- Backend service layer:
register_controller,release_controller,list_registrations,update_heartbeat. - Backend router:
POST /api/v1/controller/register,POST /api/v1/controller/{id}/release,GET /api/v1/controller/{pid}/status. - MCP enhancements to existing verbs:
prism_start— minting flow gains registration + election + context-switch detection.prism_wrap— existing wrap flow gains deregistration.
- MCP new verb:
prism_status(pid)— read-only routing table view. - Backend middleware hook: update
last_heartbeatfor the caller’s active registration on every authenticated verb call (drives peer staleness; see Decisions §D2). - Background sweep: release registrations whose
last_heartbeatis >10 min stale (configurable viaPRISM_CONTROLLER_PEER_STALE_MINUTES). Runs onbackend-httpstartup + every 60s. - Smoke test
mcp/smoke_spec030_phase1.py— seven scenarios per §6.
Scope — out
- gRPC service (
backend-grpcentrypoint) — Phase 3. - Proto codegen — Phase 2.
- Leases + capability tokens — Phase 4.
- Approval flow — Phase 5.
- Observability (OpenTelemetry, metrics) — Phase 6.
- SPEC-029 controller nudge kinds — Phase 5 (arrive with approvals).
- BIOS/CLAUDE.md updates for agent contract around controller — defer to Phase 3 (nothing to surface until streams exist).
Decisions I’m making (judgment calls flagged for Lola review in PR)
D1. CD preemption trigger
Implement: CD preempts when Claude Desktop callsprism_start(pid) on the contested PID. Not on CD process existence, not on CD activity in other projects.
Rationale: covers Frank’s “don’t yank master when CD is open but idle” concern. Matches NetBIOS analogy (a machine becomes master by participating, not by existing).
Spec text is ambiguous; will flag in PR and ask Lola to confirm the interpretation for SPEC-030 v0.2.
D2. Peer heartbeat write path
Implement: backend updateslast_heartbeat on every authenticated verb call carrying the caller’s session_id. Peer registrations auto-release after >10 min of no verb traffic.
Rationale: Postgres-native (no new mechanism, no MCP timer, no polling endpoint), exploits existing verb traffic, low write cost (one indexed UPDATE per verb). Staleness threshold configurable via env var.
Masters use gRPC keepalive in Phase 3+; Phase 1 masters heartbeat via the same verb-call path (they still make verb calls, just also hold a stream later).
D3. Multi-CD tiebreaker
Implement: if multipleclaude_desktop registrations contend, earliest registered_at wins. Subsequent CD arrivals register as peers.
Rationale: deterministic + matches the “first-come” fallback for non-CD. Avoids CD-vs-CD preemption loops if Frank runs CD on two machines.
D4. agent_surface enum lock
Implement: agent_surface VARCHAR(32) NOT NULL CHECK (agent_surface IN ('claude_desktop', 'claude_code', 'codex', 'cursor', 'other')).
Rationale: typo defense + trust-boundary hardening. CHECK constraint is trivial in initial migration, expensive to add retroactively. Extend the set via future migration when new surfaces ship.
D5. Context switch transaction boundary
Implement: two sequential transactions, no compensation.- TXN1: soft-close old PID (write
wrap_sessionnudge per SPEC-029, setreleased_aton old registration). - TXN2: election + insert on new PID.
prism_start on new PID). Simpler and avoids cross-scope transaction anti-patterns.
D6. Preemption is Postgres-authoritative; gRPC notification is a courtesy
Phase 1 has no gRPC streams, so preemption in Phase 1 just updatesis_master=false on the old row and inserts the new master with is_master=true — both in one transaction, partial UNIQUE enforces single-master invariant. The old master discovers its demotion on its next prism_status call or next verb response (which can carry a you_were_preempted=true flag).
Phase 3 adds the MasterPreempted gRPC push event for immediate notification; Phase 1 is polling/lazy.
Files touched
Backend (Python/FastAPI)
backend/alembic/versions/01X_spec030_phase1_controller_registrations.py— new migration.backend/app/models/controller_registration.py— new SQLAlchemy model.backend/app/schemas/controller_registration.py— new Pydantic schemas (Create, Out, StatusResponse).backend/app/services/controller_service.py— new; register / release / list / elect / heartbeat / stale-sweep.backend/app/routers/controller.py— new; 3 endpoints.backend/app/main.py— register new router; wire stale-sweep background task on startup.backend/app/auth/authforge.py— extend to callcontroller_service.update_heartbeat(session_id)in the authenticated-verb middleware path.backend/app/workers/controller_sweep_worker.py— new; 60s loop that releases stale peer registrations.
MCP (Python, pre-SPEC-028)
mcp/server.py— enhanceprism_startto call/controller/registerafter session mint; detect context switch; enhanceprism_wrapto call/controller/{id}/release; add new verbprism_status.mcp/client.py— addregister_controller,release_controller,get_controller_statusHTTP client methods.
Smoke test
mcp/smoke_spec030_phase1.py— seven scenarios:- Fresh start with no existing master → caller becomes master.
- Second prism_start (non-CD) on same PID → peer registration.
- CD
prism_startwith non-CD master active → CD preempts. - Master calls
prism_wrap→ deregistered, next start re-elects. prism_starton different PID → old registration released + wrap_session nudge written + new PID election runs.prism_statusreturns correct master/peer/caller shape.- Stale sweep: manually backdate
last_heartbeaton a peer row, invoke sweep, verifyreleased_atset with reasonstale_heartbeat.
PR sequencing
Single PR. Estimated ~200 lines of new Python + 150 lines of tests + migration. Too small to split meaningfully; one PR is easier to review. PR title:feat(spec-030): Phase 1 — controller registrations + election + prism_status
Deployment (once PR lands)
- Mac (local dev loop): migration auto-applies on next
start-backend.sh; smoke test locally. - server1: rsync backend/ + mcp/ via existing deploy path (
donna@server1SSH, per memoryreference_server1_ssh.md). docker compose -f docker-compose.server.yml build backend + up -d backend.- Wait healthy (~16s per observed pattern).
- Smoke test from Mac against server1.
- Lola + Donna both call
prism_starton PID-PGR01; verifyprism_statusshows both with correct surfaces and CD-wins election.
Acceptance criteria (Phase 1 subset of SPEC-030 §24)
controller_registrationstable exists with columns + three indexes + theagent_surfaceCHECK constraint.prism_startregisters the caller and runs the election per decisions §D1, §D3.- CD preempts non-CD masters per §D1.
prism_statusreturns the routing table.prism_wrapsetsreleased_aton the caller’s registration.- Context switch (
prism_start(different_pid)) releases old registration + writeswrap_sessionnudge per §D5. - Stale peer sweep releases registrations with
last_heartbeat>10 min stale. - Smoke test 7/7 scenarios pass locally and against server1.
Blockers / risks
- Nothing blocking.
- One dependency: SPEC-029’s
nudge tablemust be live for §D5’s wrap_session nudge to write. SPEC-029 is v0.2 filed but not implemented — ifnudgestable doesn’t exist yet, the context-switch path fails with a clear DB error. Mitigation: Phase 1 either implements SPEC-029’s minimal wrap_session nudge slice first, OR gracefully no-ops the nudge write if the table is absent (log a warning). Lean: no-op with warning; full SPEC-029 ships in its own timeline.
Follow-ups (outside Phase 1, tracked)
- Pre-flight transaction audit (~2h grep-read of mutation verbs) — blocking for Phase 4, will run in background during Phase 2/3.
- Items 3 (agent_surface trust disclosure), 4 (approval timeouts), 6 (NOTIFY 8000-byte cap), 7 (channel split) — fold into SPEC-030 v0.2 when Lola next revises; all are Phase 3+ surface.
References
- SPEC-030 v0.1 (this spec) —
ad3e5fef… no wait, SPEC-030 spec_id12757c47-e895-4ba6-89be-f9151f9d48ef. - SPEC-029 v0.2 — nudges table +
wrap_sessionkind used by context switch. - SPEC-024 — projection-retirement pattern (
released_at+ partial UNIQUE) applied to registrations and leases. - SPEC-023 —
session_idmint lifecycle, reused for registration scoping. - SPEC-021 —
prism_startbootstrap contract; registration is a new bootstrap side-effect.

