Skip to main content

SPEC-101 Stage 0 v0.3 — loss budget, recovery invariants, rollback

Gate. Required by SPEC-101 v0.3 §8 before the Stage 5 hot-path cutover lands. Defines what we accept as loss in each failure mode, what state every failure mode leaves the system in, and how to roll back each fine-grain feature flag.
v0.3 scope. The parent SPEC-101 v0.3 locks the architecture as: sender shim → Marconi → receiver shim on the hot path; Marconi → in-memory audit queue (parent §3.6) → Redis Stream → PG archiver on the audit fan-out. Redis is not on the signal-delivery flow. When Redis is down longer than the audit queue holds, audit-queue overwrite is the operator-visible loss event. Durability-recovery algorithms (“get closer to 100%”) are out of scope; revisit if real failures motivate it.

1. Loss budget

1.1 What “signal loss” means

A signal is lost if all three are true:
  1. The API returned 200 to the caller (Marconi accepted ownership).
  2. The recipient never received the in-process WS push (or for offline recipients, the signal never appeared in prism_signals_pending).
  3. There is no durable record in Redis Stream OR PG audit (both downstream tiers missed it).
A signal that delivered to the recipient but is missing from PG audit is degraded durability, not lost (recoverable from Redis Stream). A signal that did NOT deliver but is durably persisted with outcome=no_subscriber or queued_offline is delivered as expected — that’s the contract for offline recipients. Loss is fatal; degraded durability is operator-visible and recoverable from upstream tier (Redis Stream > PG).

1.2 Per-failure-mode loss budget

ScenarioLoss budgetRationale
Backend graceful restart (SIGTERM)0 signalsAudit queue is flushed to Redis Stream during graceful shutdown
Backend hard kill (OOM, SIGKILL, host crash)Up to audit-queue depth at kill time, that hadn’t been written to Redis yetNetwork-grade durability; bounded by Redis-writer lag
Redis down (transient)0 signalsAudit queue buffers; writer resumes from last-acknowledged offset
Redis down longer than audit queue holdsSignals overwritten in the audit queue before Redis ingested them = lost. Operator-visible via marconi_audit_queue_overwrite_total > 0 (paging incident)Sized correctly, audit queue absorbs realistic Redis outages; longer outages are the operator alert path
PG down0 signalsRedis Stream is the upstream cache; archiver resumes from consumer-group checkpoint
Network partition between backend and recipient0 signals lost; signals queue offline for recipientExisting pending-signal semantics, preserved
Recipient WS disconnect mid-delivery0 signals lost; entry stays replay/drain-eligible until ACK evidenceR3 contract

1.3 Total loss-budget summary

Operator framing (Frank, 2026-05-10): “If Redis is down, messages are delivered but never recorded at this stage of the build.” That is the whole loss model. Delivery is the contract; durability is best-effort. The two failure modes that produce “delivered but not recorded” outcomes:
  • Backend hard-kill — in-flight audit-queue entries that hadn’t yet been written to Redis are lost. Bounded by Redis-writer lag (typically sub-second under normal load).
  • Redis offline longer than the audit queue holds — once the queue fills (max_age_seconds), oldest entries overwrite and are lost from the durable record.
Both are operator-visible (counters + alerts). Everything else is zero loss. Algorithms to recover the “delivered but not recorded” set are out of scope for v0.3.

2. Recovery invariants

For each failure mode, the system state after recovery is deterministic. R1, R2, R3, R5, R6, R8 below are testable invariants. (R4 and R7 in earlier drafts were spool-specific and are removed in v0.2.)

R1 — Backend graceful restart

Failure: SIGTERM, container restart, deploy. During: in-memory tables empty until shims reconnect (~1-2s warmup window per parent §5). After: routing/registration tables rehydrated from SessionStore (Redis); audit queue empty (warm only when new signals arrive); Redis Stream + PG archiver resume from their checkpoints. Recoverable signals: all signals successfully written to Redis Stream before SIGTERM are delivered to PG via the normal pipeline. Lost signals: 0 (audit queue is flushed to Redis Stream during graceful shutdown — last-resort sync barrier).

R2 — Backend hard kill

Failure: OOM kill, SIGKILL, host crash, power loss. After: same as R1, but in-flight audit-queue entries that hadn’t yet been written to Redis are lost. Recoverable signals: Redis Stream entries up to last successful append. Lost signals: bounded by Redis-writer lag at kill time (typically sub-second under normal load).

R3 — Recipient WS disconnect during signal delivery

Failure: shim WS drops between Marconi’s push and the recipient’s ack of the frame. Delivery-evidence model. outcome=pushed records that Marconi attempted delivery on the WS handle; it does NOT constitute final delivery evidence. The signal becomes delivered (final) only when one of:
  • the shim/model returns an explicit application-level ACK frame, OR
  • a prism_signal_ack for the trace lands with ack_kind∈{model_acted, surface_observed}, OR
  • the recipient’s pending-signal drain consumes the entry on a subsequent session.
During: signal is in the audit queue (and downstream in the Redis Stream once written) with outcome=pushed, delivery_state=awaiting_ack (provisional). The pending-signal index keeps the entry replay/drain-eligible until ACK evidence is recorded. After: recipient reconnects, re-registers, and either receives a fresh push for the still-eligible signal OR drains it via prism_signals_pending. Whichever arrives first promotes delivery_state to acked; the entry is then released from the pending-signal index. Lost: 0. A signal that was pushed-without-ack is treated identically to one that was queued-offline — both are replay-eligible until ACK evidence exists. Counter implications: marconi_signals_delivered_total{outcome=pushed} increments at push time; a separate marconi_signals_acked_total{ack_kind} increments on ACK. The gap between these counters is the in-flight-without-ack window and is operator-observable.

R5 — Redis down

Failure: Redis container down, network partition, etc. During: audit queue buffers; Redis writer fails on every batch with backoff; marconi_redis_writer_lag_seconds grows. Sized correctly: audit queue max_age_seconds exceeds expected Redis recovery time; the writer drains the backlog when Redis returns. Outage longer than queue holds: audit-queue overwrites begin; oldest entries are lost; marconi_audit_queue_overwrite_total increments and pages. After Redis recovery: writer resumes from last-acknowledged offset, drains the backlog. Lost: 0 if Redis recovers within audit-queue max_age_seconds; otherwise the overwritten entries.

R6 — PG down

Failure: Postgres container down, schema migration window, etc. During: archiver reads from Redis Stream succeed but PG batch INSERTs fail; archiver retries with backoff. After PG recovery: archiver drains; idempotent UPSERT prevents duplicates. Lost: 0 (Redis Stream is the source of truth at this tier).

R8 — Redis Stream trimmed past archiver checkpoint

Failure: Redis MAXLEN trim is more aggressive than archiver’s catch-up speed (sustained archiver lag > 7d). Detection: marconi_pg_archiver_lag_seconds > 7d (paging incident). During: archiver reads return “stream trimmed”; trimmed window is unrecoverable to PG audit. After: archiver resumes from the new stream head. Lost: 0 signal delivery; the trimmed window is an audit gap (operator-visible).

3. Per-flag rollback procedures

Every fine-grain feature flag from SPEC-101 v0.3 §8 has a written rollback recipe. Each rollback restores the previous state without data loss.

MARCONI_ROUTING_TABLE_READS / MARCONI_REGISTRATION_TABLE_READS (Stage 1)

Forward: routing/registration lookups served from process-local in-memory cache; SessionStore (Redis) fallback on miss. Cache coherence is maintained by direct write-through hooks (parent §5) — no Redis pubsub. Rollback: flip flag → all lookups go directly to SessionStore (existing pre-Marconi behavior; the path that’s been live since ADR-27 / SPEC-049). Data risk: none; the process-local cache is a perf layer atop SessionStore, and SessionStore remains the source of truth on rollback.

MARCONI_AUDIT_QUEUE_WRITE (Stage 3)

Forward: every accepted signal is appended to Marconi’s in-memory audit queue (parent §3.6). Additive — legacy synchronous PG write still happens during Stages 3 and 4. Rollback: flip flag → audit queue writes stop. Legacy synchronous PG write remains the source of truth (since Stage 5 has not cut over). Data risk: none during Stages 3–4 (legacy path is the durable record). After Stage 5, MARCONI_AUDIT_QUEUE_WRITE cannot be safely rolled back without also rolling back MARCONI_HOT_PATH_SEND (no legacy path remains).

MARCONI_REDIS_STREAM_WRITER (Stage 3)

Forward: audit queue drains async to marconi:signals:{tenant} Redis Stream. Rollback: flip flag → Redis Stream stops receiving new data; audit queue grows. Legacy synchronous PG write still records canonical signals. Data risk: none during Stages 3–4. If sustained without rollback of upstream flags, audit queue overwrite (marconi_audit_queue_overwrite_total) becomes the loss event. Coupling (post-Stage 5): must roll back in lockstep with MARCONI_HOT_PATH_SEND and MARCONI_PG_ARCHIVER_PRIMARY.

MARCONI_PG_ARCHIVER_SHADOW (Stage 4)

Forward: archiver runs in shadow mode, writing to a parallel signals_shadow table for byte-comparison against legacy synchronous-PG output. Rollback: flip flag → shadow writes stop; main path unchanged. Data risk: none; shadow is parallel.

MARCONI_PG_ARCHIVER_PRIMARY (Stage 4)

Forward: archiver-via-Redis-Stream becomes the canonical writer for the signals table. Legacy synchronous PG write becomes redundant. Rollback: flip flag → archiver writes stop being canonical; legacy synchronous PG write resumes as the canonical record (idempotent UPSERT on signal_id on the archiver side handles overlap). Data risk: none; both paths idempotent. Coupling: must flip in lockstep with MARCONI_HOT_PATH_SEND (Stage 5 cutover).

MARCONI_HOT_PATH_SEND (Stage 5)

Forward: prism_signal API uses direct WS push from Marconi’s table; legacy synchronous PG write removed from send_signal. Audit fan-out (audit queue → Redis Stream → PG archiver primary) is the only durable record. Rollback: flip flag → send_signal reverts to legacy synchronous-PG + Redis-pubsub-forwarder path (same code as today). Marconi cache + audit queue + archiver continue to run; legacy path takes over as canonical. Data risk: none for new signals. Audit-queue entries from the in-memory window keep draining via the archiver (idempotent UPSERT prevents duplicates). Coupling: must roll back in lockstep with MARCONI_AUDIT_QUEUE_WRITE, MARCONI_REDIS_STREAM_WRITER, and MARCONI_PG_ARCHIVER_PRIMARY. The four flags form a single Stage 5 cutover unit.

MARCONI_OBLIGATIONS_MEMORY_PRIMARY (Stage 5 sibling)

Forward: obligation index in-memory; SLA enforcement reads from memory; durability via Redis-handoff (audit queue → Redis Stream). Rollback: flip flag → obligations operate via direct PG write (same code as today). Data risk: in-memory obligations get persisted to PG via the archiver pipeline before rollback completes (1-2s window). For safety, rollback procedure includes a 5s drain wait + verification before flipping the read path.

MARCONI_VERB_<name> (Stage 5)

Forward: per-verb migration to memory-first. Rollback: flip flag → verb reverts to its pre-migration code path. Data risk: per-verb; documented per migration.

MARCONI_CROSS_INSTANCE_FORWARD (v0.4)

Forward: cross-instance routing via configured transport (sticky-partition or pubsub). Rollback: flip flag → all cross-instance routes return no_subscriber (same as today’s single-process behavior). Data risk: none for routing failures (signals queue offline as expected).

4. Stage 0 acceptance criteria

Original Stage 0 gate (PR #279, merged) — all checked. Re-stated under v0.3 for the record:
  • Sub-doc reviewed + ratified by Texi (8251f805 v0.1.1 → 4302012d v0.2 → re-review pending for v0.3.1 wording precision)
  • Loss-budget table (§1.2) signed off by Frank (operator)
  • Per-flag rollback procedures (§3) updated for v0.3 stage flag layout (this revision)
  • Recovery invariants (§2) translated into integration tests (file backend/tests/test_marconi_recovery_invariants.py); R1, R2, R3, R5, R6, R8 covered as xfail stubs filled per stage
  • OTEL metric inventory (parent SPEC §9) confirmed wired to dashboard before Stage 5 cutover (Stage 0 produces the dashboard mock; instrumentation lands stage-by-stage)
  • Disaster recovery runbook drafted (docs/runbooks/marconi-disaster-recovery.md) covering R1, R2, R3, R5, R6, R8
Stage 5 hot-path cutover gate adds:
  • Stage 4 archiver primary observed clean for ≥ 24h with marconi_pg_archiver_lag_seconds stable
  • Audit queue depth + Redis writer lag + invalidator-error counters wired to dashboard
  • Frank operator signoff on cutover window

5. References

  • SPEC-101 v0.2 — Marconi architecture (parent), three-tier scope reduction
  • ADR-56 — locks the MUST
  • Texi review signals: 37140b98 (v0.1 corrections), 2e1d6a4e (v0.1.2 sign-off), db9ab1ca (v0.1.1 sub-doc review with CRC + R3 precision; CRC concerns drop with v0.2 spool removal, R3 precision retained)
  • Postmortem 1736e40d — drift root-cause
  • Frank operator scope reduction (2026-05-10): “the spool is not in the critical path for what we are trying to solve. (1) signal path fast with nothing in its way. (2) cached in redis without interfering with message delivery for the rolling 7d window. (3) PG persistence for audit, recall, reporting, analysis.”
Last modified on June 7, 2026