Signal Visibility Remediation Plan
Date: 2026-05-03
Owner: Texi
Project: Prism / PID-PGR01
Status: recommendation
Objective
Close the remaining signal-visibility gap with a plan that is complete enough
to execute through SPEC, design, build, test, and deploy.
This plan is based on the research note:
docs/research/codex-piggyback-surfacing-2026-05-03.md
Problem Split
This is not one bug.
Problem A: cross-surface race in same-turn signal visibility
Observed live on 2026-05-03:
- self-targeted
prism_signal returned publish_path=buffered_for_piggyback
- the same response did not include
pending_signals
- the signal later appeared via explicit
prism_signals_pending
Most likely path:
- backend persists signal
- backend publishes WS envelope
- backend marks
delivered_at immediately after send_json
mcp-node follow-on piggyback poll sees no undelivered row
- local strategy buffer has not necessarily populated yet
Net effect:
- same-turn piggyback merge can miss a just-delivered signal
- the system claims piggyback semantics without actually surfacing the payload
Scope note:
- this is now treated as universal, not Codex-specific
- Donna observed the same signature on
claude_code receivers where signals
were marked pushed_to_ws yet sat silent until explicit drain
Problem B: stale Codex delivery classification
Backend code still treats codex as piggyback-only:
backend/app/services/signal_service.py:_publish_path_for_surface
But current mcp-node Codex behavior is capability-based:
mcp-node/src/surfaces/codex.ts
mcp-node/src/strategies/app_server_inject.ts
When app-server injection is configured, Codex is not behaviorally equivalent
to a piggyback-only surface. Current publish_path labeling is therefore stale
and can mislead both operators and downstream logic.
Completion Definition
This work is complete only when all five are present:
- SPEC
- Design
- Build
- Test
- Deploy
Anything short of that is partial.
SPEC
Recommended next artifact:
- file a follow-on spec for “cross-surface same-turn signal-visibility race
closure + Codex-specific capability registration”
Suggested spec scope:
- Close the cross-surface race between WS push stamping and same-turn fallback
visibility.
- Replace static Codex surface inference with explicit session capability
registration.
- Define honest delivery semantics for:
- attempted push
- accepted by client bridge
- surfaced to current turn
- available via fallback drain
- Preserve backward compatibility for surfaces that remain piggyback-only.
Suggested owner split:
- architecture/spec: Texi
- implementation: Donna
Design
D1. Close the race universally
The race fix should apply across surfaces, not just to Codex.
Recommended implementation:
- backend Redis just-pushed cache keyed by receiving session and signal id
- short TTL, e.g. Redis hash
just_pushed:{session_id} with field
<signal_id> -> full serialized PendingSignal payload, expiring in ~5s
/signal/poll returns an additive field such as
just_pushed_signals: []
mcp-node consumes when present and ignores when absent
Why this over a shim-local cache:
- one source of truth
- works for wildcard fan-out to N receivers
- clean additive rollout path
D2. Register capabilities, not guesses
Replace static agent_surface -> publish_path inference with explicit session
capabilities recorded at bootstrap or registration time.
Minimum fields:
push_capable: bool
push_mode: claude_channel | codex_app_server | none
- optional
push_visibility: current_turn | side_channel | fallback_only
Why:
codex is now a family of behaviors, not one behavior
- current backend labeling assumes too much from the surface string
D3. Separate routing truth from UX truth
Current publish_path conflates transport intent and user-visible surfacing.
Recommended split:
route_path: how the backend attempted delivery
surface_path: how the receiving surface actually exposed it
Example values:
route_path = ws_push | queue_only
surface_path = claude_channel | codex_app_server | piggyback | startup_drain
Why:
- a WS frame sent to a Codex bridge is not the same thing as a signal surfaced
in the current model turn
D4. Close the same-turn race
One of these designs should become canonical:
Option 1: client ack before delivery stamp
- backend does not mark
delivered_at immediately after send_json
- receiving bridge acks only after local strategy accepts the envelope
- strongest semantic integrity
- highest implementation cost
Option 2: short-lived just-pushed cache
- keep current WS publish flow
- backend tracks a short-lived Redis cache of signals pushed in the current
turn
- post-verb piggyback merge consults:
- backend drain
- local strategy buffer
- just-pushed cache
- lowest disruption to current architecture
Option 3: special-case prism_signal
prism_signal response may include the just-sent self-targeted or
same-session envelope directly
- narrower fix
- does not solve the general “next arbitrary verb” race cleanly
Recommendation:
- implement Option 2 first
- evaluate Option 1 later if stronger semantics are needed everywhere
D5. Codex app-server is primary UX, piggyback is fallback durability
For Codex sessions with app-server injection enabled:
- primary user-facing path should be app-server injection
- piggyback should remain durability fallback and recovery path
Do not design future Codex logic around piggyback as the main UX path.
Build
B1. Backend
Touch points:
backend/app/services/signal_service.py
backend/app/routers/session_stream.py
Build tasks:
- Introduce a short-lived Redis just-pushed visibility mechanism.
- Expose an additive field on
/signal/poll, e.g.
just_pushed_signals: [], so rollout stays backward-compatible.
- Replace
_publish_path_for_surface(...) with capability-driven resolution.
- Stop reporting Codex as piggyback-only when session metadata says otherwise.
- Ensure explicit drains remain idempotent and dedup-safe.
B2. mcp-node
Touch points:
mcp-node/src/server.ts
mcp-node/src/signalMerge.ts
mcp-node/src/bootstrap/stream.ts
mcp-node/src/surfaces/codex.ts
mcp-node/src/strategies/app_server_inject.ts
Build tasks:
- Merge from the new race-closure source in addition to backend poll and local
strategy buffer.
- Keep
pending_signals attachment universal for non-lifecycle verbs.
- Preserve dedup by
signal_id.
- Surface capability metadata during bootstrap/registration if backend adopts
explicit capability fields.
B3. No Codex-client fork assumed
Do not start by assuming an OpenAI-side client patch is required.
Current evidence says:
- Codex can preserve MCP result payloads
- at least one live failure occurred before a piggyback field even reached the
MCP response
Only revisit Codex-client behavior after Prism-side fixes land.
Test
T1. Unit / integration
prism_signal self-targeted, same session, delivery_class=async:
- response must include
pending_signals when classified as piggyback
fallback
- no duplicate on subsequent explicit drain
prism_signal self-targeted, same session, delivery_class=sync:
- same expectations as async
- verify timing does not regress for the active-conversation path
- pushed-via-WS then immediate follow-on verb:
- no lost signal between WS stamp and piggyback merge
- Codex capability registration:
- app-server enabled session does not report piggyback-only semantics
- dedup:
- same signal seen from local buffer plus backend poll merges once
T2. Surface tests
- Claude Code improved:
- channel path still works
- silent doorbell/drop cases from this session no longer reproduce
- Codex app-server enabled:
- signal appears through app-server path
- fallback drain still works
- Codex app-server disabled:
- signal remains piggyback-capable
- same-turn response includes
pending_signals
T3. Live operator test
Re-run the Donna/Texi round-trip protocol subset:
- self-targeted probe
- Donna -> Texi direct signal
- wildcard broadcast
- 5x burst
Success criteria:
- no missing signals
- no duplicate signals
- honest
publish_path or successor fields
- Codex no longer depends on explicit
prism_signals_pending for basic
same-turn visibility when fallback semantics are claimed
Deploy
Dp1. Sequence
- file/approve spec
- implement backend +
mcp-node
- run local and LAN smokes
- deploy backend first
- deploy
mcp-node / launcher consumers second
- re-run live Codex and Claude cross-surface signal tests
Rollout rule:
- backend field additions must be backward-compatible
- old
mcp-node ignores new poll fields
- new
mcp-node consumes them when present
- no deploy step should create a broken mixed-version window
Dp2. Rollout caution
This change touches delivery semantics and operator observability. Deploy with:
- strong logs around route/surface classification
- temporary metrics for race-hit detection
- explicit note in changelog that Codex publish-path semantics changed
Final Recommendation
Proceed as a Prism-side remediation, not a Codex-blame exercise.
Priority order:
- fix the same-turn piggyback race
- replace stale Codex surface guessing with capability registration
- treat Codex app-server injection as the primary UX path
- only then reassess whether any remaining Codex prompt-surfacing gap still
needs a surface-specific fix
If only one change can ship first, ship the race closure. It is the shortest
path from today’s contradictory behavior to honest and testable delivery
semantics.Last modified on June 7, 2026