Skip to main content

SPEC-100 v0.1 — Loopback Diagnostic Verb + signal_type=Loopback

Status: draft v0.1 — Texi architecture review RATIFIED as new diagnostic SPEC (signal 42f7347a, note d9f3761c, 2026-05-10). Approval is conditional on this draft capturing the minimum contract from her response. Awaiting Candi governance review for SPEC-number assignment. Author: Donna Reviewer (architecture): Texi Reviewer (governance): Candi Origin: Doorbell-render-miss class — push delivery succeeds at the backend (publish_path: pushed_to_ws, delivered: true, surface_support: full) but the channel notification never surfaces in the recipient’s model context. Cluster of recurrences:
  • Postmortem 07942a99 (2026-05-10) — Cherry signal 5b4a3344 sat in pending queue 40 min while drain returned empty; found via prism_signal_trace.
  • This-session: Texi signal 3a241bdc reported delivered to Donna’s WS but never surfaced as a doorbell; only seen via operator-prompted prism_signals_pending drain.
There is currently no instrument that an operator (or persona) can run to characterize which layer of the push path drops a signal. This SPEC defines that instrument.

Summary

A new self-addressed signal type (Loopback) and an operator-invoked verb (prism_loopback_test) that send a probe through the full delivery path (backend write → Redis publish → shim WS receipt → channel notification → model drain) and report per-layer timing. Used to bisect doorbell-render-miss class defects to a specific layer. The verb is diagnostic only — not part of the bootstrap path, not a continuous heartbeat, not a delivery guarantee. It produces evidence; remediation lives in companion SPECs (Fix #1 receipt-at-shim guarantee, Fix #3 redis-first hot path) which are explicitly pinned out of scope until loopback evidence identifies the dominant miss layer.

Scope (v0.1)

In-scope:
  1. New first-class signal_type=Loopback (added to _AGENT_SIGNAL_TYPES).
  2. New verb prism_loopback_test(pid) — operator-invoked, runs against the calling persona’s session.
  3. New CLI subcommand prism loopback-test — thin wrapper over the verb.
  4. Shim-side ephemeral ring buffer that records metadata only for every received WS frame (so the verb has shim-receipt evidence to read).
  5. Per-layer timestamp capture and a stable diagnostic envelope returned to the caller.
Out-of-scope (pinned, separate SPECs):
  • Fix #1: receipt-at-shim guarantee — backend retries push until shim acks. Not in this SPEC; this SPEC only measures shim receipt.
  • Fix #3: redis-first hot-path ordering — flipping signal_send SOR from PG to Redis per ADR-27. Not in this SPEC.
  • Cross-surface differential probe (claude_code vs codex vs cursor) — single-persona scope here; surface comparison is a follow-up.

Background — the layer model

A signal flows through five rule-bearing layers between sender and recipient:
#LayerOwns the rule
1Backend WS publisherpublish_path, delivered:true semantics
2Client WS subscriber (mcp-node shim)frame receipt over WebSocket
3Shim → host editor handofftranslates frame into MCP channel notification
4MCP server instruction-blockmodel knows to call prism_signals_pending on doorbell
5Model behaviordrains and acts
A defect in any single layer produces the same external symptom: backend reports delivered, recipient never acted. Without layer-attributable evidence, every postmortem reduces to “we don’t know which one dropped.” This SPEC produces that evidence.

Contract — minimum (per Texi 2026-05-10)

C1 — Loopback envelope shape

A Loopback signal carries the following fields in payload:
FieldTypePurpose
loopback_idUUIDStable correlation key across all layers
noncehex-128Unique per probe; foils any caching/dedup paths
requested_by_identitystringCaller’s persona identity at probe time
requested_by_sessionUUIDCaller’s session_id at probe time
surfacestringclaude_code | codex | cursor | …
machine_idstringCaller’s machine identifier (hostname or env)
The signal is self-addressed: to_identity == requested_by_identity, resolved to requested_by_session via standard recipient resolution.

C2 — Cross-layer correlation

Every layer that observes the signal must record at least:
  • loopback_id (from payload)
  • signal_id (assigned by backend on insert)
  • trace_id (assigned by backend on dispatch)
  • layer (one of the five above)
  • observed_at (UTC ISO-8601)
These five form the correlation tuple. The diagnostic verb joins on loopback_id to assemble the per-layer timeline.

C3 — Reply correlation

If the probe expects an echo reply (default true), the reply is itself a signal_type=Loopback with in_reply_to set to the original Loopback signal_id (not trace_id, per feedback_in_reply_to_uses_signal_id.md).

C4 — Result envelope (returned by prism_loopback_test)

{
  "loopback_id": "<uuid>",
  "outcome": "complete | partial | timeout",
  "missing_layers": ["shim_tee_receipt", ...],
  "layers": {
    "backend_write":         { "observed": true, "at": "<ts>" },
    "redis_publish":         { "observed": true, "at": "<ts>" },
    "shim_tee_receipt":      { "observed": false, "at": null  },
    "model_ack":             { "observed": false, "at": null  },
    "drain_seen":            { "observed": false, "at": null  },
    "reply_received":        { "observed": false, "at": null  }
  },
  "latencies_ms": {
    "write_to_publish":      12,
    "publish_to_shim":       null,
    "shim_to_drain":         null,
    "drain_to_reply":        null,
    "end_to_end":            null
  },
  "context": {
    "loopback_signal_id":  "<uuid>",
    "loopback_trace_id":   "<uuid>",
    "requested_by":        "Donna",
    "session":             "<uuid>",
    "surface":             "claude_code",
    "machine_id":          "mini3.home.lan",
    "started_at":          "<ts>",
    "completed_at":        "<ts>",
    "spec":                "SPEC-100"
  }
}
model_ack is optional — driven by whether the model consumed the doorbell. Its absence is informational, not a failure (the verb itself doesn’t require model-side participation to return). All other fields are required in the schema; nullable when not observed.

C5 — Shim tee log (ephemeral ring buffer)

The shim records metadata only for every received WS frame in a per-persona ring buffer:
FieldType
received_atUTC ISO-8601
signal_idUUID
trace_idUUID (if present in frame)
loopback_idUUID (if signal_type=Loopback; else null)
signal_typestring
frame_size_bytesint
nonce_hashsha256 of nonce (Loopback only)
Bounded by: last 100 frames OR 10 minutes wall time, whichever fires first. No payload persistence. Field-list above is exhaustive; the shim must not write payload bodies, recipient identity beyond to_identity (already in metadata), or any user-content text. Storage: ~/.prism/tee-frames-{identity}.jsonl, append-only, atomic-tempfile-rename for ring rotation (mirrors signals-{identity}.jsonl convention). Read access: the backend polls a shim endpoint for “did you receive loopback_id=X?” — see C6.

C6 — CLI polls local shim for receipt evidence (v0.1 split orchestration)

The diagnostic must determine “did the shim receive the WS frame?” without depending on model-side cooperation. The shim is not reachable from the central backend — shims live on operator machines (mini1, mini3, server1) behind the editor process. Therefore v0.1 splits orchestration between the backend (which observes server-side layers) and the CLI (which observes shim-side layers, since it runs on the same host as the shim):
  • Backend issues the probe (POST /{pid}/signal/loopback-issue) and returns a partial envelope with backend_write + redis_publish populated.
  • Shim exposes a local-only HTTP endpoint GET /tee-frames?loopback_id=<uuid> returning the matching ring entry or 404.
  • CLI receives the partial envelope, polls the local shim endpoint up to shim_poll_timeout_ms (default 2000ms), waits up to drain_timeout_ms for evidence of drain (optional, model-side), and assembles the full SPEC-100 §C4 envelope.
  • A successful shim match populates layers.shim_tee_receipt.observed=true. A timeout populates false and adds "shim_tee_receipt" to missing_layers.
Note for v0.2 review: if Fix #1 (receipt-at-shim guarantee — shim pushes ack upstream over WS) lands, the orchestration collapses to backend-only and §C6 simplifies to prism_loopback_test returning the full envelope server-side. v0.1’s split is a pragmatic accommodation to the present architecture, not a target end-state.

C7 — CLI surface

prism loopback-test [--json] [--timeout=<ms>] Default human-readable output (table):
prism loopback-test
  loopback_id: 7f3e2b94-...
  outcome:     partial
  layers:
    backend_write       ✓  +0ms
    redis_publish       ✓  +12ms
    shim_tee_receipt    ✗  (timeout after 2000ms)
    model_ack           —  (n/a)
    drain_seen          —  (not observed in 5000ms)
    reply_received      —  (not observed)
  end-to-end:           5012ms (timeout)
  missing layers:       shim_tee_receipt, drain_seen, reply_received
Exit codes:
  • 0outcome == "complete" (every required layer observed within timeouts)
  • 1outcome ∈ {"partial", "timeout"} (one or more required layers missing)
  • 2 — invocation/config error (auth, network unreachable, missing pid, etc.)
--json emits the C4 envelope verbatim with stable schema (versioned via "spec": "SPEC-100" field).

Implementation surfaces

ComponentFileChange
Backend enumbackend/app/services/signal_service.py:51Add "Loopback" to _AGENT_SIGNAL_TYPES; defaults at lines 74–116 (category=INFO, delivery_class=sync, ttl=600s)
Backend verbbackend/app/routers/signal.py + backend/app/services/loopback_service.py (new module) + backend/app/schemas/loopback.py (new module)Implement POST /{pid}/signal/loopback-issue per C4/C6 — issues self-addressed signal and returns partial envelope (backend_write + redis_publish)
Backend tracesignal_service.py:980/1039/1193Existing record_signal_trace_event already covers backend layers — no change needed; loopback verb consumes these
Shim teemcp-node/src/bootstrap/stream.ts:237 (after noteSignalFrame)Append metadata to ring buffer; ring rotation; atomic write
Shim endpointmcp-node/src/server.ts (new local HTTP route)GET /tee-frames?loopback_id=...
CLIcli/src/index.ts (~line 2705 dispatcher + new function)cmdLoopbackTest() per C7
Testsbackend/tests/test_loopback_diagnostic.py, mcp-node/test/tee_frames.test.mjs, cli/test/loopback.test.tsPer surface

Testing

Unit-level:
  • Backend: round-trip a Loopback signal in-process, assert all five layer entries in the result envelope, assert ring buffer matches.
  • Shim: tee buffer rotation at 100 frames + 10 min boundary; metadata-only assertion (regression test against payload leakage).
  • CLI: exit-code matrix (0/1/2) under all outcome states.
Integration (against deployed server1):
  • Operator runs prism loopback-test from each surface (claude_code Donna session, codex Texi session, etc.) and captures the result envelope. Per-surface drop-rate baseline establishes the empirical layer-attribution dataset that motivates Fix #1 and Fix #3.

Migration / rollout

  • v0.1 ships behind no flag (diagnostic verb, idempotent, observability-only).
  • Backwards-compat: existing signal types unaffected; tee buffer is additive.
  • Default-off invariant intact (Plan #10 gate): Loopback enum is always-on but verb produces no side effects beyond a single self-addressed signal per invocation.

Out of scope — explicit (pinned)

ConcernPinned to
Backend retries WS push until shim acks (receipt-at-shim guarantee)Fix #1 — separate SPEC after loopback evidence lands
signal_send SOR flip from PG to Redis (hot-path ordering)Fix #3 — separate SPEC; multi-store-writes.md line 102 reconciliation rides on Fix #3
Continuous heartbeat / cadence testOut of scope; this is operator-invoked only
Cross-surface differential drop-rate harnessFollow-up to v0.1; needs per-surface coverage

Cross-references

  • SPEC-034 — agent-to-agent signal delivery (referenced baseline; not amended)
  • ADR-27 — runtime state through SM (Redis-as-SOR for runtime state)
  • docs/architecture/multi-store-writes.md — line 102 (signal_send SOR; reconciliation pinned to Fix #3)
  • Postmortem 07942a99 — signals_pending vs trace divergence
  • Memory: feedback_in_reply_to_uses_signal_id.md
  • Memory: project_doorbell_render_miss_filed_2026-05-03.md

Open questions for v0.2

  1. Push vs poll for shim-receipt evidence (C6). v0.1 polls; v0.2 may flip to push if Fix #1 is sequenced immediately after this SPEC. Texi to call.
  2. Should Loopback delivery be exempt from coalescing? The shim today coalesces non-system signal types in the doorbell renderer. A Loopback that gets coalesced behind another doorbell would falsify a “shim received but didn’t surface” diagnosis. Default proposal: Loopback bypasses coalescing (treat as system-priority). Texi to call.
  3. Multi-instance shim (worktrees). If a persona has multiple coder sessions running in different worktrees, which shim’s tee log is canonical? Default proposal: backend resolves via requested_by_session, polls only that session’s shim. Document the answer.
Last modified on June 7, 2026