Status: accepted · Version v0.2.1 · Filed 2026-05-03
SPEC-072 — Per-Session Daemon Lifecycle — Spawn, Termination, Cleanup
Status: draft (v0.2 — revised per Texi review 2026-05-03 02:39Z)
Author: Donna (Engineering)
Reviewer: Texi (Architect)
PO: Lola
Date: 2026-05-03
Relationship to SPEC-070: narrows the launch / termination contract of SPEC-070 v0.2’s daemon. Replaces §B4 (per-OS launcher integration via LaunchAgent / systemd-user / Task Scheduler). The daemon binary architecture (FSM, IPC server, WS client, plugin loader, metrics — PRs #51/#57/#58/#59/#61/#66/#67) is unchanged.
1. Motivation
SPEC-070 v0.2 §B4 specified that the per-persona daemon would be registered with the host OS as a long-lived service (LaunchAgent on macOS, systemctl --user on Linux, Task Scheduler logon trigger on Windows). The daemon would be brought up at user-login time and torn down via prism_persona_destroy, surviving across editor restarts and idle periods.
Two failures of fit when reviewed against actual usage:
-
The daemon runs when no Prism is in use. A user who logs in but doesn’t open any editor for the day still has a daemon process consuming a WS connection, heartbeating the backend, and occupying registration rows. The system pays for an “always-on listener” whose only consumer (the editor surface) isn’t running.
-
OS-launcher integration is heavy and platform-specific. Each OS requires a different installer path. The install lane carries the cross-OS complexity. Every new OS adds another launcher template.
The original argument for OS-launching was the “Lola idle for hours, teammate sends urgent mail, daemon catches it” case. Empirically that case is much weaker than it sounded at ratification: bootstrap drain via prism_start already returns queued signals on next editor open, so the daemon adds only the narrow window of “system notification while no editor is open” — which Desktop’s surface can’t even surface to the LLM (no notifications/claude/channel support).
This SPEC moves the daemon’s lifecycle inside Prism’s own usage envelope: the daemon’s lifetime is the bootstrapped Prism session, spawned on prism_start and terminated on prism_wrap. The structural value SPEC-070 actually delivers (single durable WS subscription, FSM-managed reconnect, signal continuity within an active session) is preserved.
2. Goals
- Daemon’s lifetime is bounded by the Prism session: it spawns on
prism_start and terminates on prism_wrap (or when the shim process dies abruptly without a wrap).
- Never running when there’s no active Prism session.
- No OS-launcher integration. Cross-platform contract is identical.
- 1:1 relationship — one bootstrapped session, one daemon. No shared-resource machinery (no lockfile, no connection counting, no linger window).
- Backend orphan-row cleanup is automatic via the existing routing-registry liveness probe (PR #56).
- Shim-side handle tracking ensures bootstraps are idempotent — repeated
prism_start calls without intervening prism_wrap do not duplicate daemons.
3. Non-goals
- No multi-session sharing. If a future workflow puts the same persona on two editor surfaces simultaneously (each with its own bootstrapped session), two daemons spawn. Signals fan out via the existing publish path; the recipient sees the doorbell twice. Duplicate but not broken.
- No always-on listener. “Lola idle 4h, teammate sends mail” → bootstrap drain on next session.
- No OS service / LaunchAgent / systemd integration. SPEC-070 §B4 templates are retired.
- No daemon-spawned-by-installer.
prism install does not start the daemon.
4. The unit of life: one bootstrapped session, one daemon
The MCP shim is a long-lived process inside the editor (Desktop / Code / Codex / etc.). It survives prism_start / prism_wrap / re-bootstrap cycles for the entire editor’s lifetime (and beyond, per PR #81’s relauncher staleness check). The daemon is not scoped to the shim process; it is scoped to the bootstrapped session that the shim is currently hosting.
Lifecycle contract per shim instance:
| Event | Daemon state |
|---|
| Shim process starts (editor launch) | No daemon yet |
prism_start called → session registered | Shim spawns daemon, retains the ChildProcess handle as a module-level singleton |
Shim is bootstrapped (between prism_start and prism_wrap) | Daemon is alive, holding WS subscription + UDS server |
prism_wrap called | Shim closes the daemon’s stdin pipe → daemon’s EOF handler initiates graceful shutdown → shim clears its singleton handle |
Repeat prism_start after wrap | Singleton handle is null → fresh daemon spawn |
Repeat prism_start without wrap (idempotent re-bootstrap) | Singleton handle non-null → no-op (existing daemon stays) |
Shim process dies abruptly (Cmd+Q, crash, kernel kill) without prism_wrap | Stdin pipe closes via OS → daemon’s EOF handler runs as backstop |
| Daemon dies | Shim’s doorbell client sees UDS errors; backend’s liveness probe at 60s marks the daemon row stale |
The unit is one daemon per bootstrapped session. The shim’s singleton handle prevents duplication across re-bootstrap. Stdin EOF is the only normative parent-death backstop for the unclean-exit case.
5. Spawn — on prism_start
mcp-node/src/verbs/lifecycle.ts:prism_start gains a daemon-spawn step after session registration succeeds. The shim retains the ChildProcess handle in module-level state:
// module-level singleton
let daemon_process: ChildProcess | null = null;
async function spawnDaemonForSession(): Promise<void> {
if (daemon_process) {
// Idempotent — re-bootstrap inside the same shim without an intervening
// wrap is a no-op. Existing daemon continues serving the (re-)registered
// session.
return;
}
daemon_process = spawn(process.execPath, [daemonBinPath()], {
stdio: ["pipe", "ignore", "ignore"], // shim writes to daemon's stdin; stdout/stderr ignored
detached: false, // implementation detail; not load-bearing for correctness
env: {
...process.env,
PRISM_DAEMON_SHIM_PID: String(process.pid),
PRISM_DAEMON_SHIM_DOORBELL_PATH: shimDoorbellSocketPath(),
},
});
daemon_process.unref(); // shim's event loop can exit independently of daemon's
daemon_process.on("exit", () => { daemon_process = null; });
}
stdio: ["pipe", ...] is the load-bearing detail. The shim holds a writable stream to the daemon’s stdin. When the shim closes the pipe (either explicitly on prism_wrap or implicitly via process death), the daemon receives EOF and shuts down. This is the entire parent-side termination mechanism, identical across Mac / Linux / Windows.
detached: false is implementation hygiene only — same process group, easier debugging — but is not part of the correctness story. Stdin EOF is the only normative termination signal. Process-group membership is not a portable lifecycle guarantee.
unref() allows the shim’s event loop to exit if the shim shuts down first (uncommon — the shim usually outlives the daemon).
The exit listener clears the singleton handle so a subsequent prism_start (after the daemon has gone away) will re-spawn cleanly.
daemonBinPath() resolves to mcp-node/dist/daemon/server.js per Texi’s packaging answer — the daemon ships inside the same mcp-node dist artifact, not a separate repo-root daemon tree. Same runtime, same installer surface, same version.
Spawn-failure policy: if spawn() throws (binary missing, exec permission denied), the shim logs to stderr and emits a prism_daemon_spawn_failed{reason} metric (Texi calibration). The shim continues without daemon backing — same posture as today’s behavior when no daemon is running. No system signal to the operator.
6. Termination — on prism_wrap, and on shim death
6.1 Clean termination via prism_wrap
mcp-node/src/verbs/lifecycle.ts:prism_wrap gains a daemon-shutdown step before session deregistration:
async function shutdownDaemonForSession(): Promise<void> {
if (!daemon_process) return; // no daemon to shut down
const proc = daemon_process;
daemon_process = null; // clear singleton up front so a concurrent prism_start can spawn fresh
try {
proc.stdin?.end(); // close the pipe → daemon's EOF handler fires
} catch {
/* daemon already exited; nothing to do */
}
// We don't await the daemon's exit; its graceful shutdown runs async.
// Backend's liveness probe handles the registration row regardless.
}
Closing the daemon’s stdin is the same mechanism as the shim-death backstop — just initiated explicitly rather than via OS process teardown. The daemon doesn’t need to know whether stdin closed because the shim wrapped or because the shim died; its EOF handler does the same thing either way.
6.2 Daemon-side EOF handler
The daemon binary’s entry point opens a stdin reader at process start:
process.stdin.on("end", () => initiateGracefulShutdown("stdin_eof"));
process.stdin.on("error", () => initiateGracefulShutdown("stdin_error"));
process.stdin.resume(); // start reading; we don't expect data, just EOF
initiateGracefulShutdown(reason) performs:
- Mark FSM
SHUTTING_DOWN (new terminal state added to ADR-41 FSM)
- Stop accepting new UDS clients
- Close all WS connections to backend (sends WS close frame so backend’s
_forward_pubsub task observes disconnect cleanly)
- Call
prism_wrap(kind=daemon) to deregister the daemon row (best-effort; backend liveness probe handles it if this fails)
- Flush any in-flight metrics
process.exit(0)
6.3 Shim-death backstop
If the shim process dies abruptly without calling prism_wrap (Cmd+Q without wrap, segfault, OOM, kernel kill), the OS closes the shim’s open file descriptors — including the stdin pipe to the daemon. The daemon’s stdin reader fires the same EOF handler. This is the only parent-death mechanism the spec relies on; it is identical across all three target platforms.
7. Edge cases
| Case | Behavior |
|---|
prism_wrap (clean session end) | shim closes daemon’s stdin → daemon EOF handler fires → graceful shutdown. Singleton handle cleared in shim. |
prism_start after prism_wrap (same shim, new session) | Singleton handle is null → spawn fresh daemon for the new session |
prism_start repeated without intervening wrap (idempotent re-bootstrap) | Singleton handle non-null → spawn no-op. Existing daemon continues. |
| Shim Cmd+Q (editor close) without wrap | OS closes shim’s fds → stdin pipe to daemon closes → daemon EOF handler fires |
| Shim segfault / OOM | Same as Cmd+Q — OS-level fd cleanup catches it |
| Daemon crash | daemon_process.on("exit") fires in shim; singleton cleared. Shim’s doorbell client (createShimDoorbellClient) sees UDS connect errors and runs daemon-less for the rest of this session. Next prism_start (after wrap+rebootstrap) spawns fresh. |
| Both shim + daemon crash | Backend’s liveness probe at 60s marks both stale. Next bootstrap does startup_drain and re-spawns. |
| Host suspend (laptop lid close) | Backend stops getting heartbeats. Liveness probe marks stale. On wake, daemon’s WS reconnects via FSM RECONCILING → CONNECTED; shim’s heartbeat resumes. Existing SPEC-070 / PR #56 contract; no new code path. |
| Two editors open simultaneously, same persona | Each editor’s shim hosts its own bootstrapped session, each spawning its own daemon. Signals to the persona fan out via the existing publish path; recipient sees the doorbell from both daemons. Duplicate-but-not-broken. Acceptable for v1; revisit only if observed in practice. |
8. Backend cleanup of fragments
Daemon registers as kind=daemon (PR #52) and heartbeats over the existing transport. Three terminal cleanup paths exist already:
- Graceful deregister — daemon’s
initiateGracefulShutdown calls prism_wrap(kind=daemon). Backend marks the row released_at = NOW().
- Liveness probe (PR #56) — if heartbeat falls behind the 60s threshold, send-time resolution treats the daemon as
not_available_stale and the routing-registry sweep marks the row stale.
- gRPC heartbeat continuity — bidirectional gRPC stream (via
prism-server-backend-grpc container) provides faster signal than HTTP polling; broken stream is immediate evidence of daemon death. Aspirational optimization, not a v1 gate (Texi calibration). v1 ships with HTTP heartbeat.
No new sweeper or cleanup verb required.
The stdin-EOF mechanism works identically on macOS, Linux, and Windows because it operates at the OS pipe level. No prctl(PR_SET_PDEATHSIG) (Linux-only), no kqueue EVFILT_PROC (macOS-only), no Windows Job Object — none of these are needed. They are explicitly not part of this SPEC’s correctness contract.
daemonBinPath() resolution:
- All platforms: resolved relative to the running shim binary’s location:
path.join(path.dirname(import.meta.url), "daemon", "server.js") (after URL→path normalization)
- Spawn:
spawn(process.execPath, [daemonBinPath()], ...) invokes the same Node binary the shim is running under — no cross-platform shell wrapper
prism install writes the same env block on every platform. No additional templates or per-OS install scripts are introduced by this SPEC.
10. What changes vs SPEC-070 v0.2
| SPEC-070 §B4 element | Status under SPEC-072 |
|---|
launchctl bootstrap LaunchAgent template | retired |
systemctl --user enable unit file template | retired |
schtasks /create Task Scheduler template | retired |
prism_persona_create invokes OS launcher | retired |
prism_persona_destroy calls IPC shutdown + 5s-then-kill escalation | retired as a daemon path; persona-destroy still cleans persona rows but doesn’t talk to a daemon |
| Per-OS launcher integration matrix | retired |
| Linger-enabled programmatically on Linux | retired |
| Daemon binary (FSM, IPC, WS, plugins, metrics) | unchanged |
kind=daemon registration discriminator (PR #52) | unchanged |
| Routing-registry liveness probe (PR #56) | unchanged — handles fragment cleanup |
| Daemon→shim doorbell over UDS | unchanged |
Plugin contract (SurfacePlugin interface) | unchanged |
Open TODO 5s-shutdown-then-kill in prism_persona_destroy (project_open_todo_kill_fallback) | retired |
| Daemon binary packaging path | changed — moves to mcp-node/dist/daemon/server.js (sibling of mcp-node/dist/server.js); daemon ships in the same dist artifact as the shim per Texi calibration |
Lafonda’s in-flight worktree feat/spec-070-b4-install-lane-2026-05-02 becomes a no-op for daemon launch. Install-lane work narrows to: ensure mcp-node/dist/daemon/server.js ships in prism install output, and the env block contains PRISM_DAEMON_SHIM_PID / PRISM_DAEMON_SHIM_DOORBELL_PATH keys (latter already present per claude_code.ts:resolveShimDoorbellSocketPath).
11. Implementation plan
Phase A — Shim spawn integration (Donna)
- A1. Module-level
daemon_process: ChildProcess | null singleton in mcp-node/src/verbs/lifecycle.ts
- A2.
spawnDaemonForSession() helper called after session registration in prism_start. Idempotent against re-spawn.
- A3. Daemon path resolution helper (
daemonBinPath()) using path.dirname(import.meta.url) + relative resolve to daemon/server.js
- A4. Spawn-failure handling — log to stderr, emit
prism_daemon_spawn_failed{reason} metric, non-fatal
Phase B — Shim-side wrap shutdown + daemon stdin-EOF handler (Donna)
- B1.
shutdownDaemonForSession() in prism_wrap path — clears singleton, closes daemon stdin
- B2. Daemon entry point opens
process.stdin reader on start
- B3. EOF / error handlers call
initiateGracefulShutdown(reason)
- B4. FSM gains
SHUTTING_DOWN terminal state; reconnect logic gates on it
- B5. Graceful shutdown sequence (UDS close → WS close → deregister → metrics flush → exit)
Phase C — Retire SPEC-070 §B4 (Lafonda + Donna)
- C1. Lafonda: close the
feat/spec-070-b4-install-lane-2026-05-02 worktree without merging the launcher templates
- C2. Lafonda + Donna: move daemon entry to
mcp-node/dist/daemon/server.js packaging; ensure prism install ships the file
- C3. Donna: remove or repurpose
prism_persona_destroy’s daemon-talking path
- C4. Update SPEC-070 (v0.3 or annotation) to point at SPEC-072 for launch lifecycle
Phase D — Smoke (Donna + Porsche)
- D1. End-to-end smoke: shim bootstrap →
prism_start → daemon spawn → WS connection visible in prism_status daemon view → prism_wrap → daemon exits within 1s → backend row marked released. Then prism_start again in same shim → fresh daemon. Then kill shim → daemon exits via stdin EOF backstop.
- D2. Porsche: dashboard shows daemon spawn / death events as observable mesh state (folds into SPEC-071 §11 card)
Each phase is independently shippable. Phase A + B together are the minimum shipping unit; C is cleanup; D is verification.
12. Resolved during review
Texi review 2026-05-03 02:39Z answered all v0.1 open questions:
- Daemon binary packaging: ships at
mcp-node/dist/daemon/server.js (sibling of mcp-node/dist/server.js). Same shipped artifact as the shim, same versioning unit. Adopted in §10.
- Spawn-failure policy: non-fatal, log to stderr, emit metric. No system signal escalation. Adopted in §5.
- gRPC heartbeat upgrade: aspirational, not a v1 gate. v1 ships with HTTP heartbeat. Adopted in §8 item 3.
13. References
- SPEC-070 v0.2 — daemon binary architecture (foundation; this SPEC narrows §B4 only)
- ADR-41 — daemon FSM; gains
SHUTTING_DOWN terminal state via this SPEC
- ADR-42 — project attach lifecycle (unchanged)
- ADR-43 — daemon-via-shim (filed by Texi during SPEC-070 ratification; this SPEC provides the concrete launch contract)
- PR #52 —
agent_sessions.kind=daemon discriminator (unchanged)
- PR #56 — routing-registry liveness probe (provides 60s stale threshold for fragment cleanup)
- PR #57 — daemon binary skeleton
- PR #67 — backend wake-up: daemon WS receipt is notification, not delivery
- PR #81 — mcp-launcher staleness check (relevant: the launcher is what keeps the shim long-lived across editor restarts; the unit-of-life is therefore the bootstrapped session, not the shim process)
project_open_todo_kill_fallback — retired by this SPEC
feedback_frank_spidy_sense_pattern — encapsulation > resource economy; drives the 1:1 choice
feedback_optimize_later — get architecture uniform first; multi-surface-per-persona is unobserved
project_lane_split_install_vs_deploy — Lafonda owns install lane scope reduction in §11 Phase C
- SPEC-071 — Signal Bus QoS (independent SPEC; daemon participates as a
kind=daemon recipient via the same delivery semantics)
14. Revision log
- v0.1 (2026-05-03 02:35Z) — initial draft circulated for review
- v0.2 (2026-05-03 02:42Z) — Texi review revisions:
- Finding 1: corrected the unit-of-life. Daemon is scoped to the bootstrapped session, NOT the shim process. Spawn on
prism_start (idempotent against re-bootstrap), terminate on prism_wrap (shim closes daemon’s stdin, fires the same EOF mechanism). Stdin-EOF on shim death remains the unclean-exit backstop. §4-§7 rewritten.
- Finding 2: removed the “process-group cleanup belt-and-suspenders” claim from the correctness story. Stdin EOF is the only normative parent-death mechanism.
detached: false is now flagged as implementation hygiene only.
- Finding 3: added the module-level singleton handle
daemon_process in the shim. Re-bootstrap without wrap is a no-op (existing daemon continues); wrap clears the handle so the next prism_start spawns fresh. §5-§7 carry the rule.
- Plus calibration: daemon packaging at
mcp-node/dist/daemon/server.js, spawn-failure metric, gRPC aspirational not v1 gate.