Status:
accepted · ADR-26 · Filed 2026-04-27Decision
Establish a tiered diagnostic tool library so agents stop regenerating ad-hoc shell/Python for repeatable operations. Local prereqs are Docker Desktop and Node only — Python is NOT a host prereq; it lives inside the backend container image. The library architecture honors this split. Tier 1 — Shipped diag scripts (immediate scope).- Container-side (Python, in backend image): modules under
backend/app/diag/, invoked viadocker exec prism-backend python -m app.diag.<name> --json. Python is guaranteed inside the image; never assumed on host. - Host-side (Node, in CLI): subcommands under
cli/src/diag/, surfaced asprism diag <name>. Node is the committed host prereq alongside Docker Desktop. - All diag scripts emit structured JSON for agent consumption (no human-formatted output as primary).
- Catalog:
prism diag listenumerates available tools, surface, and one-line description.
- When a Tier-1 diag is used 3+ times across sessions, promote to a first-class MCP verb (e.g.
prism_diagnose <op>). - First-class registration, schema, agent-surface visibility, signal integration where useful.
- Best for cross-machine ops where ssh + docker-exec is otherwise repeated.
- Code-shape patterns that aren’t full tools — recalled via
semantic_recall. Curated. Retired (or promoted) once a stable pattern emerges.
diag_alembic_drift— comparealembic_versionrow vs versions/ files; flag missing / orphaned revisions. Direct response to the live ADR-driver: gRPC container crash-looping on missing revision 023.diag_session_registrations— list active registrations per identity, flag dead-session leaks (the bug we’re watching live as multiple Donna/Texi sessions accumulate).diag_signal_queue— pending signals, age, recipient, leak indicators (non-Signal rows mixed in — the SPEC-045 lifecycle leak we identified last session).diag_grpc_health— internalgrpc_health_probe+ listener state report.diag_redis_session_plane— Redis keyspace audit (sessions, locks, TTLs).diag_master_election— current master per project, contention history.
prism diag containers—docker psfor all prism containers, status, restart count, last-N log lines.prism diag connectivity— probe host:port reachability for backend HTTP, gRPC (45051 → 50051), Redis, Postgres. Direct response to Candi’s mini1 firewall question.prism diag bios— verify CLAUDE.md / PRISM.md / AGENTS.md replicas vs Prism templates (drift detection).prism diag mcp— verify MCP server registration with Claude Code / Codex; show config diff. Direct response to Texi’s launcher env-forwarding fix.prism diag dirs— verify$PROJECT_ROOTand$PRISM_ROOT, list registered projects, registry parity vs filesystem.prism diag launcher— replaycoder.sh/coder.ps1env forwarding to verify all env vars reach MCP subprocess.
Rationale
Token economy. Every regenerated 30–50 line snippet costs ~200 tokens generated + ~200 read + run cost. Aprism diag X invocation is one tool call (~30 tokens including args + structured response). Compression is 10–50× on repetitive ops. With five agents (Donna, Texi, Candi, Lafonda, Desiree) independently burning the same patterns, savings compound.
Cross-agent consistency. Today Donna’s grep, Texi’s grep, and Candi’s grep can subtly differ — one tool means one diagnostic, no interpretation drift during incident response.
Operational learning compounds. Diagnostics promoted to scripts get reviewed, tested, version-controlled. Each session stops re-discovering the same shape of problem. The tool library becomes the ops-knowledge codification.
Reliability and portability. Container-side Python runs in a known image. Host-side Node runs against the explicitly committed prereq set (Docker Desktop + Node). No surprise dependency on host Python, jq, or other ad-hoc tools. Resolves the recent zsh-portability discussion at architectural level: agents call prism diag X, not raw shell.
Promotion gate forces curation. Without escalation criteria the library becomes a graveyard of one-off scripts. Tier-2 promotion to MCP verb requires evidence of ≥3 cross-session uses — only proven patterns earn first-class status.
Alternatives Considered
Use ad-hoc shell + host Python indefinitely. Rejected — high token cost, drift, no shared learning. Also presumes host Python, which is explicitly NOT a Prism prereq. Skip Tier-1 scripts; jump straight to MCP verbs. Rejected — promotion requires evidence of repeat use, which can’t be generated without a low-friction first tier. MCP verbs are heavyweight (schema, registration, signal integration); a one-off diag is overkill. Vector-stored recipes only (no scripts). Rejected as primary mechanism — recipes still require per-invocation regeneration in agent context. Tokenomics worse than scripts. Vector store is right for code-shape patterns (Tier 3), wrong for stable operations. Bundle all diag tools as a shell library on host. Rejected — host shell varies (zsh on mini3, bash on Linux, PowerShell on Windows). Portability cost is real. Node + Python-in-container avoids it cleanly. Single combinedprism diag Python CLI on host. Rejected — would re-introduce host Python prereq. Node-on-host + Python-in-container preserves the committed prereq surface.
