Status:
draft · Version 0.2 · Filed 2026-05-01SPEC-061 v0.2 — Drill-Down Detail Panels
Version
0.2Status
draftParent / Depends-On
- SPEC-061 v0.1 — RED+USE Saturation & DB Performance (the cards this SPEC adds drill-downs to).
- SPEC-050 v0.3 — Prism Dashboard observability service (the dashboard this lives in).
Changelog
- 0.2: Added drill-down panels for every Saturation row + a top-level “current incidents” view triggered by the rollup header. Each panel fetches detail on demand from a new family of backend endpoints under
/db_health/<topic>. v0.1 left rows mute except for hover tooltips; this closes the question “which service is being warned about and what should I do?” with a one-click answer.
1. Problem
The v0.1 Saturation card surfaces that something is hot but answers no follow-on question:- Signal pipeline · 7 · oldest 140082s — which signals? from whom, to whom, why are they stuck?
- Postgres · pool 60% — which connections are open, what queries are they running, is autovacuum behind?
- Heartbeat lag · p95 41s — which controllers are stale?
psql / redis-cli / kubectl logs — exactly the kind of friction the dashboard exists to remove. Tooltips help but are not enough; sustained operator focus belongs in a panel, not a 200ms hover.
2. Goals
- Every Saturation row is clickable. Click → modal panel with row-specific detail.
- Header rollup is clickable. Click → “current incidents” view: every warn/hot row, what each means in plain English, and the suggested next action.
- Detail data fetched on demand. No new periodic load on the backend; the modal makes one fetch when opened.
- Plain-English explanations. Each panel includes a one-paragraph “what this means and why it matters” header (so an operator who hasn’t read SPEC-061 can act on it).
- No destructive actions in v0.2. Force-drain, force-expire, and similar operator tools are scoped but not built here — they go to SPEC-061 v0.3 with proper guardrails (capability tokens per SPEC-049).
Non-Goals
- Live-streaming detail (each open is a one-shot fetch; close-and-reopen to refresh).
- Mobile layout polish for the modals.
- Cross-tenant aggregation (same scope as SPEC-061 v0.1).
3. Architecture
3.1 Backend additions
Five new GET endpoints, root-mounted alongside/db_health (same auth posture):
| Path | Purpose |
|---|---|
/db_health/signals | List of signal_queue rows where delivered_at IS NULL. Each row: signal_id, type, category, from→to, age_sec, recipient_state (active / unregistered / paused), payload_kind. |
/db_health/postgres | pg_stat_activity snapshot (LIMIT 200) + top 10 hot tables by n_dead_tup + slowest 5 queries from pg_stat_statements if installed. |
/db_health/redis | Selected INFO blocks (server, memory, clients, stats, persistence) + LATENCY HISTORY (top 5 events) + CLIENT LIST summary. |
/db_health/neo4j | Heap GC stats, page-cache hit/miss counters, top transactions by age (dbms.listTransactions()), longest queries. |
/db_health/heartbeat | All registered controllers sorted by lag desc. Each row: identity, surface, machine_id, last_seen_at, lag_sec, project_chain. |
/db_health), single-shot, returns JSON, < 200 ms typical. Per-leg timeout 5 s.
3.2 Dashboard backend additions
Express proxy routes under/dashboard/api/db_health/<topic> that fetch the backend endpoints. One handler in dashboard/src/routes/api_db_health.ts registered next to existing api_* routes. Authenticated via existing session middleware; cached for 2 s server-side to absorb burst clicks.
3.3 Frontend additions
- Modal shell:
dashboard/web/src/components/SaturationDrillModal.tsx— overlay, ESC-to-close, click-outside-to-close, body locked. Header shows row label + current value + pressure pill; body is the row-specific drill content. - Drill content components, one per row id, all under
dashboard/web/src/components/drills/:SignalPipelineDrill.tsx— table of stuck signals, color-coded by age, “why stuck” column.PostgresDrill.tsx— three sub-sections (active queries, hot tables, slow queries).RedisDrill.tsx— INFO blocks + latest LATENCY events + client list summary.Neo4jDrill.tsx— heap chart + GC stats + active tx + longest queries.HttpQueueDrill.tsx— uvicorn worker breakdown (read from/db_health/postgresactivity rows that look like API requests, until/db_health/httpis added in v0.3).ChannelBacklogDrill.tsx— placeholder explaining “instrumentation pending SPEC-045 §4”.HeartbeatLagDrill.tsx— roster table sorted by lag desc.
- Incidents overview:
IncidentsOverviewDrill.tsx— bound to header click, lists every warn/hot row with explanation + recommended next step. Each item links into its row drill. - Plain-English text: centralized in
dashboard/web/src/lib/saturationCopy.ts. Every row has ameaning, awhyItMatters, and a per-pressuresuggestion. Localized once; used in modal + tooltip. - Click wiring:
PressureRowgainsonClick,SaturationCardheader gains a click target, both push state into a ZustanddrillTargetslice (null | { kind: 'row', id: string } | { kind: 'incidents' }). Modal subscribes to that slice and renders accordingly.
3.4 No destructive actions in v0.2
All operator-action surfaces (force-drain a stuck signal, force-deregister a stale controller, terminate a long Neo4j tx, etc.) are deliberately deferred to v0.3 because they require:- Capability tokens (SPEC-049 §6 — operator permission gating).
- Confirmation modal with consequences spelled out.
- Audit-log writes per action.
4. Concrete file plan
Backend (Python)
backend/app/services/db_health.py— five new async functions:list_stuck_signals(),pg_activity_detail(),redis_detail(),neo4j_detail(),heartbeat_detail(). Each returns a structured dict; failure modes use the samestatus/last_errorshape as the v0.1 leg samples.backend/app/routers/db_health.py— five new GET routes mirroring §3.1.
Dashboard backend (TypeScript)
dashboard/src/routes/api_db_health.ts— five proxy handlers, each fetches the backend endpoint with a 5 s timeout, in-memory 2 s cache.dashboard/src/index.ts— register the routes.
Dashboard frontend (TypeScript / React)
dashboard/web/src/lib/api.ts— five fetch wrappers (fetchStuckSignals,fetchPostgresDetail,fetchRedisDetail,fetchNeo4jDetail,fetchHeartbeatDetail).dashboard/web/src/store/dashboard.ts—drillTargetslice +setDrillTarget(t)action.dashboard/web/src/lib/saturationCopy.ts— copy table.dashboard/web/src/components/SaturationDrillModal.tsx— modal shell.dashboard/web/src/components/drills/*.tsx— 8 drill components (7 row drills + IncidentsOverviewDrill).dashboard/web/src/components/charts/PressureRow.tsx—onClickprop forwarded; cursor:pointer when handler provided.dashboard/web/src/components/cards/SaturationCard.tsx— wire rollup header click + per-row click →setDrillTarget.dashboard/web/src/App.tsx(or newLayout.tsxslot) — render<SaturationDrillModal />once at root, subscribed todrillTarget.
5. Phasing
- 5.1 — Backend (1 day). Five detail endpoints + service functions.
- 5.2 — Dashboard proxy (½ day). Express routes + tsc clean.
- 5.3 — Modal shell + Signal pipeline drill (½ day). This is the one currently firing; ship it first so v0.2 demos meaningfully.
- 5.4 — Remaining 6 drills + incidents overview (1 day).
- 5.5 — Plain-English copy + visual polish (½ day).
6. Open questions
- Should the modal be a full-page route (
/saturation/<row>) or a true overlay? Lean: overlay for v0.2 (fast), route for v0.3 if we want shareable URLs to live incidents. - Should we expose a “snooze” mechanism per row so a known-stale signal pipeline doesn’t dominate the rollup? Defer to v0.3 once we have an opinion on whether to filter at the rules engine (saturation.ts) or at display time only.
7. Risks / notes
pg_stat_statementsmay be missing on personal-mode Postgres images. The PostgresDrill rendersslow queries: extension not installedin that case; not a blocker.dbms.listTransactions()requires a Cypher procedure available on Neo4j 4+; older clusters degrade the Neo4jDrill gracefully.- The 2 s server-side cache means rapid open/close of the same drill returns identical JSON for that window. Acceptable; full refresh is one more open after the cache expires.
- Adding drill modals doesn’t change the live SSE / push pipeline. No regression risk to the v0.1 cards.

