Status: accepted · Version 0.2 · Filed 2026-05-02
SPEC-070 — Per-agent persona daemon — always-on listener bridge with turn-boundary-preserving delivery
Status
Draft.
Version: 0.2
Author: Texi (System Architect)
Reviewer: Donna (Engineering)
PO review target: Lola
Date: 2026-05-02
Summary
Add a per-agent daemon layer to Prism’s signal-delivery architecture. One daemon process runs per persona on a host. It maintains a long-lived connection to Prism’s session stream, supervises the local surface shim, and preserves signal visibility across idle, backgrounded, suspended, or restarting agent surfaces.
The daemon is a local receiver bridge and supervisor, not a second backend. Prism FastAPI remains the global router and durable queue owner. The daemon preserves Prism’s turn-boundary discipline: it may listen continuously, but it surfaces pending work to the agent one doorbell per turn boundary rather than batching notifications.
Problem
Prism coordination is organized around turn boundaries. Signals drain at turns, memory recall happens between turns, deltas are captured per turn, and peer interrupts become visible at turn boundaries. That makes the boundary itself load-bearing.
Two failure modes attack the same coordination axis:
- Voluntary loss of boundaries: batching collapses multiple decision points into one long tool burst.
- Involuntary loss of boundaries: backgrounded tabs, suspended sessions, idle agent surfaces, or local shim failure delay or suppress the next turn boundary entirely.
Tier-1 mitigation already shipped in PR #45 / commit 5df6679: substantive turns defensively call prism_signals_pending when the last drain is stale. That rule reduces exposure, but it is a behavioral polling workaround. If a surface is backgrounded for 30 minutes, nothing becomes visible until focus returns and the next substantive turn happens.
The daemon is the structural fix. It keeps the listening path alive while the surface is intermittent, but it preserves the same one-turn-at-a-time delivery discipline the batching finding defended.
Goals
- Preserve signal visibility across surface idle/suspend/restart periods without changing Prism’s durable queue semantics.
- Keep tenant/persona isolation at the process boundary: one daemon per persona, not one router per machine.
- Maintain doorbell-then-drain delivery: daemon notifications are advisory,
prism_signals_pending remains authoritative.
- Preserve one-per-turn pacing for agent-visible work. No notification storms.
- Supervise local surface shims so shim death is not equivalent to signal blindness.
- Provide a bounded, dashboard-friendly observability contract for Porsche’s panel.
- Keep Lafonda’s install/supervision matrix simple: one common daemon binary with thin surface plugins.
Non-goals
- Not a replacement for
prism_signals_pending
- Not a general multi-persona machine bus
- Not a source of truth for queue durability
- Not a bypass around FastAPI/session-manager routing
- Not a batching layer for agent-visible signals
Invariants
- Per-agent, not per-machine. One daemon process per persona.
- Process boundary is the isolation wall. Multi-tenant separation comes from OS processes, not in-process routing discipline.
- Push is a doorbell, not the payload. The durable queue remains the source of truth.
- Delivery discipline stays one-per-turn. If
N signals queue during idleness, the daemon may observe all N, but it must not surface them as an N-doorbell storm.
- Routing-registry lifecycle is explicit.
PeerLeft invalidates stale bindings, PeerJoined refreshes them, and stale targets produce structured failure rather than silent misrouting.
Why per-agent wins
A per-agent daemon matches the real unit of supervision:
- one daemon owns one persona’s queue, wake-ups, shim lifecycle, and local IPC
- crash scope is one persona, not the host’s entire agent fleet
- OS permissions become the security model for local IPC
- no shared in-memory routing table across personas
- local health state maps cleanly to one dashboard card
The resource tradeoff is accepted. Repeating isolated units is cheaper than centralizing cross-tenant routing and failure coupling into a per-machine multiplexer.
Architecture
Prism FastAPI / Session Stream
|
| authenticated WS + durable queue + routing registry
v
persona-daemon (one process per identity)
|
| local IPC only
v
surface shim / surface plugin
|
| surface-native notification/injection
v
agent thread/session
Responsibility split
FastAPI / session manager
- persist every signal durably before volatile fan-out
- own session registration, lifecycle, and routing registry
- publish lifecycle events (
PeerJoined, PeerLeft, preemption, etc.)
- expose
prism_signals_pending as the authoritative pending envelope
- reject or mark stale targets with structured send-side failure
Persona daemon
- maintain the long-lived connection to Prism’s session stream
- publish a distinct daemon heartbeat to Prism, separate from agent-surface heartbeat
- preserve local delivery continuity across agent idle periods
- supervise the local shim process with restart-on-crash and bounded backoff where the surface architecture uses a shim
- translate pending work into exactly one agent-visible doorbell per turn boundary
- maintain a small local pending index as a cache only
- emit structured daemon-owned logs to OS-native sinks
- expose health/metrics for observability
Surface shim / plugin
- own surface-specific protocol/injection mechanics
- accept fire-and-forget doorbells without becoming the delivery ACK
- never become the source of truth for pending signals
- degrade to explicit drain without data loss
Daemon registration model
The daemon registers with Prism as a sibling runtime kind, not as a speaking agent session.
Rules:
- registration key shape distinguishes
kind=daemon from kind=agent
- daemon rows are counted separately from speaking agents in status/routing views
- daemons are not master-eligible
- daemons do not preempt or speak on behalf of a persona
- daemon auth is scoped to the same operator/persona context as the served surface
This preserves observability without leaking supervision processes into project control-plane semantics.
Spawn topology
Spawn topology is per persona, not per machine and not per user.
- one daemon process per persona registered to a project on a host
- multiple personas on one host imply multiple daemons
- each daemon is supervised independently
- no daemon-per-machine multiplexer
- no daemon-per-user umbrella process
Naming convention for host-level units:
prism-daemon-<tenant_slug>-<identity>.service on Linux
prism-daemon-<tenant_slug>-<identity>.plist on macOS
prism-daemon-<tenant_slug>-<identity>.task or equivalent tenant+identity composition on Windows scheduled-task naming
Avoid raw UUIDs in operator-facing unit names; they are unreadable in journalctl, launchctl, and Event Viewer surfaces.
This constraint is normative for Lafonda’s launcher/install matrix.
Plugin contract
The daemon is one common binary with thin surface plugins.
Plugin API
| Method | Purpose | Contract |
|---|
notify_surface(payload) | enqueue one surface-visible doorbell | fire-and-forget; not an ACK |
resume_surface() | rebind when the local UI/session restarts | lightweight reattach hook |
status() | probe surface liveness | bounded health probe |
Note on naming. Method names in this table are logical/illustrative labels. TypeScript implementation symbols use camelCase per language idiom (notifySurface, resumeSurface, status); call sites should match the implementation spelling.
Anything richer than this is a design smell: it means surface protocol concerns are leaking into the daemon.
Doorbell payload shape
notify_surface(payload) carries a minimal typed notice, not signal content.
{
"timestamp": "<iso8601>",
"source": "daemon",
"kind": "pending_work_notice"
}
Rules:
- no
signal_id
- no signal preview text
- no sender identity
- no bounded-count summary
If the payload starts carrying signal preview material, the daemon becomes a payload-bearing channel and violates invariant #3. The doorbell exists only to wake the surface and force an authoritative drain.
Local IPC contract
The daemon exposes a local-only control socket per persona.
- macOS/Linux: Unix domain socket in a persona-scoped runtime dir with owner-only permissions
- Windows: named pipe with equivalent single-user ACLs
- no LAN listener in v0.2
- no machine-global unauthenticated port
Control socket verbs
| Verb | Purpose | Notes |
|---|
status | liveness, WS state, shim state, lifecycle taxonomy state, last heartbeat age, last drain age, pending count, last error class | read-only |
notify_surface | emit one doorbell to the local surface path | internal/operator callable |
resume_surface | rebind after local UI/session restart | internal/operator callable |
shutdown | clean stop for wrap/archive/destroy/uninstall flows | operator-callable through Prism lifecycle verbs |
This is intentionally not a general messaging API.
Surface integrations
General rule: the daemon supervises the local shim; the shim owns surface protocol translation.
- Codex: daemon supervises the existing MCP/Codex shim and does not speak app-server protocol directly.
- Claude Code: daemon supervises the Claude Code shim path and hands off surface delivery to the shim.
- Claude Desktop/fallback surfaces: daemon still owns WS continuity and local pacing, even if visible delivery degrades to the next explicit turn boundary.
Q2 is therefore resolved in favor of daemon-via-shim, not daemon-direct-to-app-server. This decision is ADR-worthy because future implementers will be tempted to collapse the shim layer.
Delivery semantics
The daemon preserves doorbell-then-drain delivery.
- Server-side push or reconnect reconciliation reveals pending work.
- Daemon records local pacing state in its cache-only pending index.
- Daemon emits one surface-visible doorbell at an agent activity boundary.
- Agent drains authoritative pending work through
prism_signals_pending.
- Local uncertainty always resolves back to the durable queue, never to cached daemon state.
ACK semantics
Surface plugin acceptance is not delivery acknowledgment. A successful notify_surface(payload) means only that the plugin accepted the doorbell into its local surface-side queue. The actual ACK is the agent’s drain through prism_signals_pending.
Local pending index
The local pending index is a cache only. It may be dropped at any time without correctness loss. It exists for pacing and rate-limiting, not for durable recordkeeping.
Reconnect and reconcile behavior
Reconnect is a repair path, not a trust-the-cache path.
- on reconnect, the daemon reconciles against the authoritative pending queue
- reconnect does not inject immediately
- the daemon waits for the next agent activity boundary, then emits one doorbell indicating pending work
- if 12 signals accumulated during background time, the first turn back may drain all 12, but the daemon still emits one doorbell rather than storming the surface
Lifecycle / delivery state taxonomy
The inter-team status taxonomy is:
running
sleeping
restarting
failed
Idle mode
Idle / sleeping means stay-running-but-low-activity.
Definition:
- WS connected
- periodic heartbeat still flowing
- no meaningful compute work in progress
- awaiting wake on push, shim reconnect, or operator status/probe request
It does not mean OS-specific process suspension. The daemon does not exit in idle mode; it idles. This is the portable behavior that works across LaunchAgent, systemd-user, and Windows Task Scheduler without launcher-specific tricks.
Lifecycle state machine
Doorbell emission is gated by an explicit state machine:
DISCONNECTED -> RECONCILING -> CONNECTED -> EMITTING
Rules:
- reconnect enters
RECONCILING
- authoritative pending drain completes before
EMITTING
- no mid-turn injection while not in
EMITTING
- local doorbell release is paced only from
EMITTING
This gate is load-bearing. Without it, reconnect races can create duplicate or mistimed doorbells in the middle of active agent work.
Reconnect + reconcile sequence
| Step | Actor | Action |
|---|
| 1 | daemon | detect WS disconnect or wake-from-sleep |
| 2 | daemon | enter DISCONNECTED then reconnect transport |
| 3 | daemon | enter RECONCILING |
| 4 | daemon | query authoritative pending queue |
| 5 | daemon | refresh cache-only local pending index |
| 6 | daemon | enter CONNECTED |
| 7 | daemon | wait for next agent activity boundary |
| 8 | daemon | enter EMITTING and emit one doorbell |
| 9 | agent | call prism_signals_pending and act |
Routing-registry lifecycle
Postmortem 0660f88e makes registry lifecycle an architectural requirement.
PeerJoined creates or refreshes the active binding for (identity, surface)
PeerLeft(reason=wrap|expire|preempt) invalidates that binding immediately
prism_signal must not silently route to wrapped/stale sessions
- if resolution fails health checks, the sender receives structured failure
The daemon depends on this contract but does not own it.
Heartbeats
Daemon heartbeat and surface heartbeat answer different questions.
- daemon heartbeat: is this persona’s daemon process alive and connected?
- surface heartbeat: is this persona’s agent surface alive and accepting signals?
The daemon therefore publishes its own Prism-visible heartbeat as a distinct signal/registration stream. Q1 is resolved in favor of explicit daemon heartbeat rather than surface-heartbeat conflation.
Project attach lifecycle
v0.2 chooses a persona×project-scoped daemon model for v1.
Rules:
- a daemon is bound to one persona and one project for its live lifecycle
- attach-to-project is a discrete lifecycle boundary, but in v1 it occurs at daemon create/start rather than as an arbitrary later dynamic attach
- detach happens through wrap/archive/destroy lifecycle, not through free-floating project switching
- multi-project personas are out of scope for v1 unless/until a later SPEC introduces dynamic attach/detach semantics explicitly
This resolves Lafonda’s flag-migration ambiguity: attach-time project bootstrap is a real daemon-owned boundary, but it is not yet a general runtime project-switch primitive.
Failure model
Local failure must not mutate the global queue into thinking the agent saw something it did not.
- if local IPC to the surface fails, keep the signal pending and record a bounded local error
- if the daemon dies, the supervisor restarts only that persona’s daemon
- if the shim dies, the daemon restarts the shim with backoff
- if Prism WS disconnects, the daemon reconnects and reconciles pending work
- if the surface is absent, the daemon remains healthy but reports degraded delivery state
Observability contract
Exported metrics
All daemon metrics include tenant_id, agent_identity, agent_surface, and machine_id labels. session_id is explicitly forbidden as a metric label because per-session churn would destroy per-agent continuity and explode cardinality.
| Metric | Labels | Meaning |
|---|
daemon_up | tenant_id, agent_identity, agent_surface, machine_id | 1 when daemon process is healthy |
daemon_heartbeat_age_seconds | tenant_id, agent_identity, agent_surface, machine_id | age of last daemon heartbeat |
daemon_drain_age_seconds | tenant_id, agent_identity, agent_surface, machine_id | age of last successful agent-visible drain |
daemon_signal_queue_depth | tenant_id, agent_identity, agent_surface, machine_id | current local view of pending-work depth |
daemon_errors_total | tenant_id, agent_identity, agent_surface, machine_id, error_class | cumulative bounded-error counter |
daemon_last_error_timestamp_seconds | tenant_id, agent_identity, agent_surface, machine_id, error_class | last-seen timestamp per bounded error class |
Error classes
error_class is a bounded enum. Unknown local failures collapse to unclassified, not a free-string label.
Initial enum:
ws_connect_failed
ws_stream_stalled
pending_reconcile_failed
surface_unavailable
surface_inject_failed
ipc_bind_failed
ipc_auth_failed
heartbeat_publish_failed
config_invalid
unclassified
Metric semantics
Heartbeat and drain remain separate because the 0660f88e class of failure is precisely the case where a process can be alive while work progress is stalled.
Emit cadence
Hybrid model:
- event-driven internal state updates
- periodic 10s metric export tick
This keeps state accurate while remaining compatible with Porsche’s existing scraper cadence.
Cardinality budget
At a 50-persona ceiling:
daemon_errors_total: 10 error_class * 50 active daemons = 500 series
daemon_last_error_timestamp_seconds: 10 * 50 = 500 series
daemon_up, daemon_heartbeat_age_seconds, daemon_drain_age_seconds, daemon_signal_queue_depth: 4 * 50 = 200 series
Total: roughly 1200 active series at the v0.2 ceiling, comfortably under the 10k/min cap.
Logging
Daemon-owned structured logging is mandatory on every OS.
Reasoning:
- Windows launchers uniformly blackhole stdout as an operator surface
- treating Windows as a special-case logger would fracture parity
- same daemon logging code can emit to OS-native sinks while preserving one structured schema
Operator-facing sinks differ by platform, but daemon logging does not:
- macOS: OS-native log sink surfaced through
log show
- Linux: user-level journal surfaced through
journalctl --user
- Windows: Event Viewer / task-linked OS-native sink
Per-OS log-link routing should align with Lafonda’s matrix terminology and Porsche’s panel expectations.
Security
- local IPC is per persona and single-user
- no LAN listener in v0.2
- no machine-global port
- no cross-persona command channel
- launcher/service manager owns daemon spawn with explicit identity
- filesystem/socket ACLs are the first security boundary
If future remote-control mode is desired, it is a new design, not an implicit extension of v0.2.
User-level launchers are mandatory in v0.2 because personal-mode Prism depends on per-user Docker Desktop semantics.
Canonical launcher shapes:
- macOS:
LaunchAgent
- Linux:
systemd --user with linger enabled
- Windows: Task Scheduler logon-trigger
Non-default / rejected for v0.2 defaults:
- macOS
LaunchDaemon
- Linux
systemd-system
- Windows Service as
SYSTEM
These root-level launchers fail the Docker/Desktop-per-user constraint and therefore are not valid v0.2 defaults.
Windows install-vs-fidelity tradeoff
Windows has a structural tradeoff, not a mechanism-choice accident.
- Task Scheduler logon-trigger is the canonical v0.2 choice because it is the lightest install path and satisfies the per-user Docker constraint.
- It pays in supervised-restart fidelity and operator logging ergonomics.
- Windows Service has stronger restart fidelity, but its admin install and credential-storage costs disqualify it as the v0.2 default.
Therefore:
- accept logon-bound autostart for Windows v0.2
- document Windows Service as an admin-required future upgrade path
- do not try to paper over the gap with a pseudo-service workaround
Linux linger policy
prism_install_local should enable systemd --user linger programmatically. This is a one-time admin-permitted action and should not be left as a manual operator checklist item.
Lifecycle coupling
Daemon lifecycle is hard-coupled to persona/project lifecycle.
prism_wrap for the persona stops that persona’s daemon
- persona archive/removal stops that persona’s daemon
- project destroy stops all daemons attached to that project on the host
- uninstall flows expose clean daemon shutdown/removal
The shutdown IPC verb is therefore operator-callable through Prism lifecycle verbs, not merely an internal hook.
If clean shutdown fails to complete within 5 seconds, the launcher’s process-kill is the fallback. Operator-facing destroy/archive flows should call shutdown first, then escalate to kill if needed.
Attach-to-project behavior and SPEC-021 flag disposition
Daemon attach-to-project is a discrete lifecycle event. In v1 it occurs at daemon create/bind time rather than as a later free-floating attach.
Not all legacy launcher flags stay launcher concerns.
-autonomy remains declarative launcher input and rebrands as a daemon/tool-permission hint
-install-deps and -activate-venv migrate out of launcher semantics and into daemon attach-to-project lifecycle behavior
These are project-bootstrap concerns, not persistent daemon identity concerns. The daemon should own attach-time project prep instead of preserving them as raw launcher flags forever.
Porsche/Lafonda thin contract
For Porsche
The dashboard contract depends on five fixed surfaces:
- control socket
status
- exported metrics set above
- lifecycle taxonomy:
running / sleeping / restarting / failed
- lifecycle state machine:
DISCONNECTED -> RECONCILING -> CONNECTED -> EMITTING
- reconnect+reconcile sequence table above
For Lafonda
The launcher/install contract depends on five fixed surfaces:
- one daemon process per persona
- one common daemon binary with thin plugins
- clean
shutdown path callable from Prism lifecycle verbs
- platform-specific supervisor wiring only, not behavior divergence
- canonical per-OS launcher defaults: LaunchAgent / systemd-user with linger / Task Scheduler logon-trigger
Acceptance criteria
Follow-on work
- File an ADR for the daemon-via-shim decision (Q2), which is the most likely future re-litigation point.
- Implement the daemon registration kind in session/routing tables.
- Wire Porsche’s cardinality sheet to the explicit ~1200-series estimate.
- Hand Lafonda the per-persona spawn topology and lifecycle coupling contract.
- Track Windows Service as a future commercial/admin-install upgrade path rather than a v0.2 default.
- Revisit dynamic multi-project persona attach/detach in a later SPEC rather than smuggling it into v1 implicitly.
Cross-references
- PR #45 / commit
5df6679 — Tier-1 auto-drain rule, the behavioral mitigation this SPEC supersedes structurally
- postmortem
0660f88e — stale-binding/routing-lifecycle failure that makes registry invalidation load-bearing
- surface-comparison research
docs/research/surface-comparison-2026-05-02.md — turn-boundary / batching findings that motivate one-per-turn daemon pacing
- Lafonda research PR #46 / commit
3dae620 / docs/research/daemon-launcher-matrix-2026-05-02.md — cross-OS launcher matrix and logging/launcher tradeoffs
- SPEC-044 — channel push as doorbell, not delivery
- SPEC-048 — Codex signal injection path; this SPEC resolves daemon-via-shim, not daemon-direct-to-app-server
- SPEC-050 — dashboard surface that consumes daemon metrics
- SPEC-054 — Node shim architecture that the daemon supervises rather than bypasses
- SPEC-021 — launcher/attach flag disposition context (
-autonomy, -install-deps, -activate-venv) and Lafonda’s flag-disposition matrix context