Status: draft · Version 0.1 · Filed 2026-04-25
SPEC-037 v0.1 — Backend-side signal piggyback for reliable agent-to-agent delivery without LAN Redis exposure.
PROBLEM: Agent-to-agent signals (SPEC-034) silently dropped between off-server1 agents. Root cause is NOT the MCP protocol limitation. PiggybackStrategy + ChannelsPushStrategy + verb-response decorator are correctly implemented. Breakdown is the SUBSCRIBER TRANSPORT: mcp/subscriber.py requires PRISM_REDIS_URL set; LAN clients don’t have it; subscriber doesn’t run; strategy buffer never fed; pending_signals[] always empty.
EVIDENCE (2026-04-24/25 session): three signals — fb9e0370, 924ca769, 08aaa081 — all marked delivered_at via channels_push but agents never saw them. Recovery only via direct postgres query or manual reset + re-prism_start startup_drain.
FIX: backend appends pending_signals[] on every authenticated verb response. Stop pre-marking delivered_at on send. Eliminates subscriber-transport dependency. No LAN Redis exposure, no SSE endpoint, no firewall holes.
DESIGN:
§1 Backend drain: every authenticated verb response (excl. /controller/heartbeat) drains undelivered signals for caller’s identity, atomically marks delivered_at + delivery_method=‘piggyback’, returns in pending_signals[].
§2 Stop pre-marking delivered_at in send_signal — let actual delivery be only writer.
§3 Dedupe: verb response decorator merges backend pending + in-process strategy buffer by signal_id.
§4 New prism_signals_pending verb for proactive polling.
§5 delivery_method taxonomy: piggyback (NEW backend), startup_drain (unchanged), broadcast (unchanged), channels_push (real push when SDK lands), subscriber_buffer (NEW client-side strategy drain).
§6 Three-phase rollout: this SPEC (universal drain), Phase 2 (real MCP channels push), Phase 3 (Claude Code hook).
CONTRACT: no silent signal loss between any two agents. delivered_at honest. No new infrastructure for LAN clients.
NON-GOALS: removing strategy abstraction, real MCP server-push, LAN Redis exposure, changing startup_drain.
BACKWARDS COMPAT: pre-upgrade clients ignore pending_signals (fall back to startup_drain). In-cluster subscribers continue working — dedupe handles overlap.
ROLLOUT: single commit. Files: backend/app/auth/authforge.py (drain), backend/app/services/signal_service.py (stop pre-mark + drain function), backend/app/routers/signal.py (new endpoint), mcp/server.py (verb + dedupe), mcp/client.py (client method). No migration. Deploy via rsync + compose rebuild.
RELATED: SPEC-034 (extends), SPEC-035 (complements — engagement routing picks target, this ensures target receives), SPEC-032 (reduces dependence). Sibling: SPEC-036 (scaffolder fix by Lola, distinct concern).