Skip to main content
Status: accepted · Version 0.2 · Filed 2026-05-02

SPEC-070 — Per-agent persona daemon — always-on listener bridge with turn-boundary-preserving delivery

Status

Draft. Version: 0.2 Author: Texi (System Architect) Reviewer: Donna (Engineering) PO review target: Lola Date: 2026-05-02

Summary

Add a per-agent daemon layer to Prism’s signal-delivery architecture. One daemon process runs per persona on a host. It maintains a long-lived connection to Prism’s session stream, supervises the local surface shim, and preserves signal visibility across idle, backgrounded, suspended, or restarting agent surfaces. The daemon is a local receiver bridge and supervisor, not a second backend. Prism FastAPI remains the global router and durable queue owner. The daemon preserves Prism’s turn-boundary discipline: it may listen continuously, but it surfaces pending work to the agent one doorbell per turn boundary rather than batching notifications.

Problem

Prism coordination is organized around turn boundaries. Signals drain at turns, memory recall happens between turns, deltas are captured per turn, and peer interrupts become visible at turn boundaries. That makes the boundary itself load-bearing. Two failure modes attack the same coordination axis:
  1. Voluntary loss of boundaries: batching collapses multiple decision points into one long tool burst.
  2. Involuntary loss of boundaries: backgrounded tabs, suspended sessions, idle agent surfaces, or local shim failure delay or suppress the next turn boundary entirely.
Tier-1 mitigation already shipped in PR #45 / commit 5df6679: substantive turns defensively call prism_signals_pending when the last drain is stale. That rule reduces exposure, but it is a behavioral polling workaround. If a surface is backgrounded for 30 minutes, nothing becomes visible until focus returns and the next substantive turn happens. The daemon is the structural fix. It keeps the listening path alive while the surface is intermittent, but it preserves the same one-turn-at-a-time delivery discipline the batching finding defended.

Goals

  1. Preserve signal visibility across surface idle/suspend/restart periods without changing Prism’s durable queue semantics.
  2. Keep tenant/persona isolation at the process boundary: one daemon per persona, not one router per machine.
  3. Maintain doorbell-then-drain delivery: daemon notifications are advisory, prism_signals_pending remains authoritative.
  4. Preserve one-per-turn pacing for agent-visible work. No notification storms.
  5. Supervise local surface shims so shim death is not equivalent to signal blindness.
  6. Provide a bounded, dashboard-friendly observability contract for Porsche’s panel.
  7. Keep Lafonda’s install/supervision matrix simple: one common daemon binary with thin surface plugins.

Non-goals

  • Not a replacement for prism_signals_pending
  • Not a general multi-persona machine bus
  • Not a source of truth for queue durability
  • Not a bypass around FastAPI/session-manager routing
  • Not a batching layer for agent-visible signals

Invariants

  1. Per-agent, not per-machine. One daemon process per persona.
  2. Process boundary is the isolation wall. Multi-tenant separation comes from OS processes, not in-process routing discipline.
  3. Push is a doorbell, not the payload. The durable queue remains the source of truth.
  4. Delivery discipline stays one-per-turn. If N signals queue during idleness, the daemon may observe all N, but it must not surface them as an N-doorbell storm.
  5. Routing-registry lifecycle is explicit. PeerLeft invalidates stale bindings, PeerJoined refreshes them, and stale targets produce structured failure rather than silent misrouting.

Why per-agent wins

A per-agent daemon matches the real unit of supervision:
  • one daemon owns one persona’s queue, wake-ups, shim lifecycle, and local IPC
  • crash scope is one persona, not the host’s entire agent fleet
  • OS permissions become the security model for local IPC
  • no shared in-memory routing table across personas
  • local health state maps cleanly to one dashboard card
The resource tradeoff is accepted. Repeating isolated units is cheaper than centralizing cross-tenant routing and failure coupling into a per-machine multiplexer.

Architecture

Prism FastAPI / Session Stream
        |
        | authenticated WS + durable queue + routing registry
        v
persona-daemon (one process per identity)
        |
        | local IPC only
        v
surface shim / surface plugin
        |
        | surface-native notification/injection
        v
agent thread/session

Responsibility split

FastAPI / session manager

  • persist every signal durably before volatile fan-out
  • own session registration, lifecycle, and routing registry
  • publish lifecycle events (PeerJoined, PeerLeft, preemption, etc.)
  • expose prism_signals_pending as the authoritative pending envelope
  • reject or mark stale targets with structured send-side failure

Persona daemon

  • maintain the long-lived connection to Prism’s session stream
  • publish a distinct daemon heartbeat to Prism, separate from agent-surface heartbeat
  • preserve local delivery continuity across agent idle periods
  • supervise the local shim process with restart-on-crash and bounded backoff where the surface architecture uses a shim
  • translate pending work into exactly one agent-visible doorbell per turn boundary
  • maintain a small local pending index as a cache only
  • emit structured daemon-owned logs to OS-native sinks
  • expose health/metrics for observability

Surface shim / plugin

  • own surface-specific protocol/injection mechanics
  • accept fire-and-forget doorbells without becoming the delivery ACK
  • never become the source of truth for pending signals
  • degrade to explicit drain without data loss

Daemon registration model

The daemon registers with Prism as a sibling runtime kind, not as a speaking agent session. Rules:
  • registration key shape distinguishes kind=daemon from kind=agent
  • daemon rows are counted separately from speaking agents in status/routing views
  • daemons are not master-eligible
  • daemons do not preempt or speak on behalf of a persona
  • daemon auth is scoped to the same operator/persona context as the served surface
This preserves observability without leaking supervision processes into project control-plane semantics.

Spawn topology

Spawn topology is per persona, not per machine and not per user.
  • one daemon process per persona registered to a project on a host
  • multiple personas on one host imply multiple daemons
  • each daemon is supervised independently
  • no daemon-per-machine multiplexer
  • no daemon-per-user umbrella process
Naming convention for host-level units:
  • prism-daemon-<tenant_slug>-<identity>.service on Linux
  • prism-daemon-<tenant_slug>-<identity>.plist on macOS
  • prism-daemon-<tenant_slug>-<identity>.task or equivalent tenant+identity composition on Windows scheduled-task naming
Avoid raw UUIDs in operator-facing unit names; they are unreadable in journalctl, launchctl, and Event Viewer surfaces. This constraint is normative for Lafonda’s launcher/install matrix.

Plugin contract

The daemon is one common binary with thin surface plugins.

Plugin API

MethodPurposeContract
notify_surface(payload)enqueue one surface-visible doorbellfire-and-forget; not an ACK
resume_surface()rebind when the local UI/session restartslightweight reattach hook
status()probe surface livenessbounded health probe
Note on naming. Method names in this table are logical/illustrative labels. TypeScript implementation symbols use camelCase per language idiom (notifySurface, resumeSurface, status); call sites should match the implementation spelling.
Anything richer than this is a design smell: it means surface protocol concerns are leaking into the daemon.

Doorbell payload shape

notify_surface(payload) carries a minimal typed notice, not signal content.
{
  "timestamp": "<iso8601>",
  "source": "daemon",
  "kind": "pending_work_notice"
}
Rules:
  • no signal_id
  • no signal preview text
  • no sender identity
  • no bounded-count summary
If the payload starts carrying signal preview material, the daemon becomes a payload-bearing channel and violates invariant #3. The doorbell exists only to wake the surface and force an authoritative drain.

Local IPC contract

The daemon exposes a local-only control socket per persona.
  • macOS/Linux: Unix domain socket in a persona-scoped runtime dir with owner-only permissions
  • Windows: named pipe with equivalent single-user ACLs
  • no LAN listener in v0.2
  • no machine-global unauthenticated port

Control socket verbs

VerbPurposeNotes
statusliveness, WS state, shim state, lifecycle taxonomy state, last heartbeat age, last drain age, pending count, last error classread-only
notify_surfaceemit one doorbell to the local surface pathinternal/operator callable
resume_surfacerebind after local UI/session restartinternal/operator callable
shutdownclean stop for wrap/archive/destroy/uninstall flowsoperator-callable through Prism lifecycle verbs
This is intentionally not a general messaging API.

Surface integrations

General rule: the daemon supervises the local shim; the shim owns surface protocol translation.
  • Codex: daemon supervises the existing MCP/Codex shim and does not speak app-server protocol directly.
  • Claude Code: daemon supervises the Claude Code shim path and hands off surface delivery to the shim.
  • Claude Desktop/fallback surfaces: daemon still owns WS continuity and local pacing, even if visible delivery degrades to the next explicit turn boundary.
Q2 is therefore resolved in favor of daemon-via-shim, not daemon-direct-to-app-server. This decision is ADR-worthy because future implementers will be tempted to collapse the shim layer.

Delivery semantics

The daemon preserves doorbell-then-drain delivery.
  1. Server-side push or reconnect reconciliation reveals pending work.
  2. Daemon records local pacing state in its cache-only pending index.
  3. Daemon emits one surface-visible doorbell at an agent activity boundary.
  4. Agent drains authoritative pending work through prism_signals_pending.
  5. Local uncertainty always resolves back to the durable queue, never to cached daemon state.

ACK semantics

Surface plugin acceptance is not delivery acknowledgment. A successful notify_surface(payload) means only that the plugin accepted the doorbell into its local surface-side queue. The actual ACK is the agent’s drain through prism_signals_pending.

Local pending index

The local pending index is a cache only. It may be dropped at any time without correctness loss. It exists for pacing and rate-limiting, not for durable recordkeeping.

Reconnect and reconcile behavior

Reconnect is a repair path, not a trust-the-cache path.
  • on reconnect, the daemon reconciles against the authoritative pending queue
  • reconnect does not inject immediately
  • the daemon waits for the next agent activity boundary, then emits one doorbell indicating pending work
  • if 12 signals accumulated during background time, the first turn back may drain all 12, but the daemon still emits one doorbell rather than storming the surface

Lifecycle / delivery state taxonomy

The inter-team status taxonomy is:
  • running
  • sleeping
  • restarting
  • failed

Idle mode

Idle / sleeping means stay-running-but-low-activity. Definition:
  • WS connected
  • periodic heartbeat still flowing
  • no meaningful compute work in progress
  • awaiting wake on push, shim reconnect, or operator status/probe request
It does not mean OS-specific process suspension. The daemon does not exit in idle mode; it idles. This is the portable behavior that works across LaunchAgent, systemd-user, and Windows Task Scheduler without launcher-specific tricks.

Lifecycle state machine

Doorbell emission is gated by an explicit state machine: DISCONNECTED -> RECONCILING -> CONNECTED -> EMITTING Rules:
  • reconnect enters RECONCILING
  • authoritative pending drain completes before EMITTING
  • no mid-turn injection while not in EMITTING
  • local doorbell release is paced only from EMITTING
This gate is load-bearing. Without it, reconnect races can create duplicate or mistimed doorbells in the middle of active agent work.

Reconnect + reconcile sequence

StepActorAction
1daemondetect WS disconnect or wake-from-sleep
2daemonenter DISCONNECTED then reconnect transport
3daemonenter RECONCILING
4daemonquery authoritative pending queue
5daemonrefresh cache-only local pending index
6daemonenter CONNECTED
7daemonwait for next agent activity boundary
8daemonenter EMITTING and emit one doorbell
9agentcall prism_signals_pending and act

Routing-registry lifecycle

Postmortem 0660f88e makes registry lifecycle an architectural requirement.
  • PeerJoined creates or refreshes the active binding for (identity, surface)
  • PeerLeft(reason=wrap|expire|preempt) invalidates that binding immediately
  • prism_signal must not silently route to wrapped/stale sessions
  • if resolution fails health checks, the sender receives structured failure
The daemon depends on this contract but does not own it.

Heartbeats

Daemon heartbeat and surface heartbeat answer different questions.
  • daemon heartbeat: is this persona’s daemon process alive and connected?
  • surface heartbeat: is this persona’s agent surface alive and accepting signals?
The daemon therefore publishes its own Prism-visible heartbeat as a distinct signal/registration stream. Q1 is resolved in favor of explicit daemon heartbeat rather than surface-heartbeat conflation.

Project attach lifecycle

v0.2 chooses a persona×project-scoped daemon model for v1. Rules:
  • a daemon is bound to one persona and one project for its live lifecycle
  • attach-to-project is a discrete lifecycle boundary, but in v1 it occurs at daemon create/start rather than as an arbitrary later dynamic attach
  • detach happens through wrap/archive/destroy lifecycle, not through free-floating project switching
  • multi-project personas are out of scope for v1 unless/until a later SPEC introduces dynamic attach/detach semantics explicitly
This resolves Lafonda’s flag-migration ambiguity: attach-time project bootstrap is a real daemon-owned boundary, but it is not yet a general runtime project-switch primitive.

Failure model

Local failure must not mutate the global queue into thinking the agent saw something it did not.
  • if local IPC to the surface fails, keep the signal pending and record a bounded local error
  • if the daemon dies, the supervisor restarts only that persona’s daemon
  • if the shim dies, the daemon restarts the shim with backoff
  • if Prism WS disconnects, the daemon reconnects and reconciles pending work
  • if the surface is absent, the daemon remains healthy but reports degraded delivery state

Observability contract

Exported metrics

All daemon metrics include tenant_id, agent_identity, agent_surface, and machine_id labels. session_id is explicitly forbidden as a metric label because per-session churn would destroy per-agent continuity and explode cardinality.
MetricLabelsMeaning
daemon_uptenant_id, agent_identity, agent_surface, machine_id1 when daemon process is healthy
daemon_heartbeat_age_secondstenant_id, agent_identity, agent_surface, machine_idage of last daemon heartbeat
daemon_drain_age_secondstenant_id, agent_identity, agent_surface, machine_idage of last successful agent-visible drain
daemon_signal_queue_depthtenant_id, agent_identity, agent_surface, machine_idcurrent local view of pending-work depth
daemon_errors_totaltenant_id, agent_identity, agent_surface, machine_id, error_classcumulative bounded-error counter
daemon_last_error_timestamp_secondstenant_id, agent_identity, agent_surface, machine_id, error_classlast-seen timestamp per bounded error class

Error classes

error_class is a bounded enum. Unknown local failures collapse to unclassified, not a free-string label. Initial enum:
  • ws_connect_failed
  • ws_stream_stalled
  • pending_reconcile_failed
  • surface_unavailable
  • surface_inject_failed
  • ipc_bind_failed
  • ipc_auth_failed
  • heartbeat_publish_failed
  • config_invalid
  • unclassified

Metric semantics

Heartbeat and drain remain separate because the 0660f88e class of failure is precisely the case where a process can be alive while work progress is stalled.

Emit cadence

Hybrid model:
  • event-driven internal state updates
  • periodic 10s metric export tick
This keeps state accurate while remaining compatible with Porsche’s existing scraper cadence.

Cardinality budget

At a 50-persona ceiling:
  • daemon_errors_total: 10 error_class * 50 active daemons = 500 series
  • daemon_last_error_timestamp_seconds: 10 * 50 = 500 series
  • daemon_up, daemon_heartbeat_age_seconds, daemon_drain_age_seconds, daemon_signal_queue_depth: 4 * 50 = 200 series
Total: roughly 1200 active series at the v0.2 ceiling, comfortably under the 10k/min cap.

Logging

Daemon-owned structured logging is mandatory on every OS. Reasoning:
  • Windows launchers uniformly blackhole stdout as an operator surface
  • treating Windows as a special-case logger would fracture parity
  • same daemon logging code can emit to OS-native sinks while preserving one structured schema
Operator-facing sinks differ by platform, but daemon logging does not:
  • macOS: OS-native log sink surfaced through log show
  • Linux: user-level journal surfaced through journalctl --user
  • Windows: Event Viewer / task-linked OS-native sink
Per-OS log-link routing should align with Lafonda’s matrix terminology and Porsche’s panel expectations.

Security

  • local IPC is per persona and single-user
  • no LAN listener in v0.2
  • no machine-global port
  • no cross-persona command channel
  • launcher/service manager owns daemon spawn with explicit identity
  • filesystem/socket ACLs are the first security boundary
If future remote-control mode is desired, it is a new design, not an implicit extension of v0.2.

Cross-platform host contract

User-level launchers are mandatory in v0.2 because personal-mode Prism depends on per-user Docker Desktop semantics. Canonical launcher shapes:
  • macOS: LaunchAgent
  • Linux: systemd --user with linger enabled
  • Windows: Task Scheduler logon-trigger
Non-default / rejected for v0.2 defaults:
  • macOS LaunchDaemon
  • Linux systemd-system
  • Windows Service as SYSTEM
These root-level launchers fail the Docker/Desktop-per-user constraint and therefore are not valid v0.2 defaults.

Windows install-vs-fidelity tradeoff

Windows has a structural tradeoff, not a mechanism-choice accident.
  • Task Scheduler logon-trigger is the canonical v0.2 choice because it is the lightest install path and satisfies the per-user Docker constraint.
  • It pays in supervised-restart fidelity and operator logging ergonomics.
  • Windows Service has stronger restart fidelity, but its admin install and credential-storage costs disqualify it as the v0.2 default.
Therefore:
  • accept logon-bound autostart for Windows v0.2
  • document Windows Service as an admin-required future upgrade path
  • do not try to paper over the gap with a pseudo-service workaround

Linux linger policy

prism_install_local should enable systemd --user linger programmatically. This is a one-time admin-permitted action and should not be left as a manual operator checklist item.

Lifecycle coupling

Daemon lifecycle is hard-coupled to persona/project lifecycle.
  • prism_wrap for the persona stops that persona’s daemon
  • persona archive/removal stops that persona’s daemon
  • project destroy stops all daemons attached to that project on the host
  • uninstall flows expose clean daemon shutdown/removal
The shutdown IPC verb is therefore operator-callable through Prism lifecycle verbs, not merely an internal hook. If clean shutdown fails to complete within 5 seconds, the launcher’s process-kill is the fallback. Operator-facing destroy/archive flows should call shutdown first, then escalate to kill if needed.

Attach-to-project behavior and SPEC-021 flag disposition

Daemon attach-to-project is a discrete lifecycle event. In v1 it occurs at daemon create/bind time rather than as a later free-floating attach. Not all legacy launcher flags stay launcher concerns.
  • -autonomy remains declarative launcher input and rebrands as a daemon/tool-permission hint
  • -install-deps and -activate-venv migrate out of launcher semantics and into daemon attach-to-project lifecycle behavior
These are project-bootstrap concerns, not persistent daemon identity concerns. The daemon should own attach-time project prep instead of preserving them as raw launcher flags forever.

Porsche/Lafonda thin contract

For Porsche

The dashboard contract depends on five fixed surfaces:
  1. control socket status
  2. exported metrics set above
  3. lifecycle taxonomy: running / sleeping / restarting / failed
  4. lifecycle state machine: DISCONNECTED -> RECONCILING -> CONNECTED -> EMITTING
  5. reconnect+reconcile sequence table above

For Lafonda

The launcher/install contract depends on five fixed surfaces:
  1. one daemon process per persona
  2. one common daemon binary with thin plugins
  3. clean shutdown path callable from Prism lifecycle verbs
  4. platform-specific supervisor wiring only, not behavior divergence
  5. canonical per-OS launcher defaults: LaunchAgent / systemd-user with linger / Task Scheduler logon-trigger

Acceptance criteria

  • daemon runtime exists as one common binary with thin surface plugins
  • one daemon process is supervised per persona, not per machine
  • daemon registration is distinct from speaking agent registration and is not master-eligible
  • daemon publishes a distinct heartbeat separate from surface heartbeat
  • reconnect path uses DISCONNECTED -> RECONCILING -> CONNECTED -> EMITTING
  • reconnect emits no immediate injection and no doorbell storm
  • local pending index is cache-only and correctness survives its loss
  • surface acceptance is not treated as delivery ACK
  • notify_surface payload is minimal and carries no signal preview content
  • all exported metrics include tenant_id, agent_identity, agent_surface, and machine_id
  • no exported metric includes session_id
  • bounded error_class enum enforced with unclassified fallback
  • lifecycle verbs can stop daemons on wrap/archive/destroy and escalate to kill on timeout
  • Codex integration supervises existing shim rather than speaking app-server protocol directly
  • daemon emits structured logs to OS-native sinks on all three OS families
  • Windows v0.2 uses Task Scheduler logon-trigger as canonical default and documents Windows Service as future upgrade path
  • Linux v0.2 enables linger programmatically during install
  • v1 process model is persona×project-scoped, with no dynamic multi-project attach/detach

Follow-on work

  1. File an ADR for the daemon-via-shim decision (Q2), which is the most likely future re-litigation point.
  2. Implement the daemon registration kind in session/routing tables.
  3. Wire Porsche’s cardinality sheet to the explicit ~1200-series estimate.
  4. Hand Lafonda the per-persona spawn topology and lifecycle coupling contract.
  5. Track Windows Service as a future commercial/admin-install upgrade path rather than a v0.2 default.
  6. Revisit dynamic multi-project persona attach/detach in a later SPEC rather than smuggling it into v1 implicitly.

Cross-references

  • PR #45 / commit 5df6679 — Tier-1 auto-drain rule, the behavioral mitigation this SPEC supersedes structurally
  • postmortem 0660f88e — stale-binding/routing-lifecycle failure that makes registry invalidation load-bearing
  • surface-comparison research docs/research/surface-comparison-2026-05-02.md — turn-boundary / batching findings that motivate one-per-turn daemon pacing
  • Lafonda research PR #46 / commit 3dae620 / docs/research/daemon-launcher-matrix-2026-05-02.md — cross-OS launcher matrix and logging/launcher tradeoffs
  • SPEC-044 — channel push as doorbell, not delivery
  • SPEC-048 — Codex signal injection path; this SPEC resolves daemon-via-shim, not daemon-direct-to-app-server
  • SPEC-050 — dashboard surface that consumes daemon metrics
  • SPEC-054 — Node shim architecture that the daemon supervises rather than bypasses
  • SPEC-021 — launcher/attach flag disposition context (-autonomy, -install-deps, -activate-venv) and Lafonda’s flag-disposition matrix context
Last modified on May 18, 2026