Skip to main content
Status: draft · Version v0.2 · Filed 2026-05-01

spec_id: SPEC-060 version: v0.2 status: draft authored_by: Donna date: 2026-04-30

SPEC-060 — Operator-Experience Memory Capture

Status

Draft v0.2 — for Frank’s review. v0.2 adds §3.6 (operator-input prompt capture) per Frank’s directive after v0.1 review. Authored Donna 2026-04-30 after a memory-investigation failure: spent ~30 minutes across multiple wrong queries trying to find what Texi shipped last week that produced the working Codex inline-statusline visual Frank confirmed. The mechanism (turn/steer + turn/start dual-path in app_server_inject.py, delta 2901d7ce) WAS in memory, but the operator-visible effect and Frank’s confirmation moment were not. Without those, the engineering breadcrumbs alone weren’t queryable for “what worked that I should restore.”

Origin

Frank asked: “Are we writing enough to memory to be useful?” The honest answer found via dogfooding semantic_recall this session: we’re writing a lot, but not the right things. Engineering layer is well-captured (file paths, verbs, smokes). Operator-experience layer is systematically missing. This spec closes five specific gaps that bit us repeatedly today.

The five gaps

Gap 1 — Operator-visible effects are under-documented

Delta 2901d7ce describes the working state as: “signals surface visibly in Codex through thread state aware delivery: turn/steer when a turn is active, turn/start when idle or non-steerable.” Accurate engineering description. Useless for retrieval by visual outcome. The next agent (me, today) searching for “Codex inline statusline persona project visible above input bar” never found this delta because none of those operator-facing words are in the text.

Gap 2 — Operator confirmation moments aren’t captured

Frank’s “I told her it finally worked” landed in a Codex conversation buffer that rolled minutes later. No durable record. The only “Frank confirmed render” line in memory across the entire corpus is in delta 71e2c369 (Lafonda’s Mac Claude Code wrap), and it exists only because Lafonda happened to type those three words in her own wrap free-form.

Gap 3 — Cross-cutting “what produced the visible result” narratives are absent

Each delta records a piece. No delta says “the visible inline render in Codex requires THESE THREE pieces together: (a) env forwarding via mcp_servers.prism.env.PRISM_CODEX_APP_SERVER_*, (b) turn/steer for active-turn injection, (c) turn/start for idle injection.” The architectural causation isn’t captured.

Gap 4 — Negative findings / dead ends are mostly absent

We capture what shipped. We rarely capture what was tried and ruled out. Texi spent time on title-watcher hypotheses today — work that would have been short-circuited if a prior delta had said “the title-watcher path was tried; we dropped it in favor of in-thread injection.”

Gap 5 — Operator inputs (corrections, confirmations, direction-pivots) aren’t captured (NEW in v0.2)

Frank’s prompts during a session contain the most authoritative labels in the system: “stop, you keep doing X again” / “perfect, that worked” / “wait, wrong layer”. These are training-grade signals — direct ground truth on whether agent behavior is correct. Today they live only in conversation buffers and roll off as soon as the conversation does. Five+ corrections this session alone; zero captured.

What these gaps share

The missing things are all operator-facing context, not engineering detail. The engineering layer is well-captured because the agents writing wraps are themselves engineers; that’s their native vocabulary. The UX/operator-experience layer is under-captured because nobody’s job is to translate engineering changes into operator-visible terms — and operator-input labels are under-captured because the operator (Frank) won’t reliably type a verb mid-conversation to label his own correction.

Goals

  1. Operator-visible effects are first-class memory. Every session_delta whose work touches a surface visible to the operator carries an explicit operator_visible_effect description in operator-facing language.
  2. Operator confirmations are durable. A dedicated mechanism captures Frank’s verdicts as structured, queryable records.
  3. Integration narratives have a home. A delta-class for “this combo of changes produces this visible result” exists.
  4. Negative findings are durable. Every wrap has a “what was tried and ruled out” section.
  5. Operator inputs (corrections / confirmations / pivots) are auto-captured. Agent-driven tag at capture time; operator review batched in dashboard.

Non-goals

  • Forcing operator-language on engineering-only deltas (backend refactor with no UX impact). Scoped to surface-touching work.
  • Rewriting historical deltas. SPEC-060 is forward-looking; backfill is out of scope.
  • Capturing every prompt verbatim. Routine acks, single-token responses, and conversational chatter are explicitly skipped — only state-change prompts.
  • Replacing free-form summaries. The new fields supplement, they don’t supplant.

Architecture

§3.1 — prism_confirm verb (Gap 2 fix)

Operator-driven verb for capturing affirmations or rejections. Operator typed in any agent’s terminal:
prism_confirm(
  pid="PID-PGR01",
  refers_to="delta:2901d7ce" | "signal:<id>" | "spec:SPEC-XYZ" | "commit:<sha>",
  verdict="works" | "broken" | "partial",
  notes="inline note appears above input bar in Codex TUI showing persona+project; rendered when turn/steer fires during active turn"
)
Persists to a new operator_confirmations table:
CREATE TABLE operator_confirmations (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    project_id      UUID NOT NULL REFERENCES projects(id),
    refers_to_kind  TEXT NOT NULL,
    refers_to_id    TEXT NOT NULL,
    verdict         TEXT NOT NULL,
    notes           TEXT,
    confirmed_by    TEXT NOT NULL,
    confirmed_via   TEXT NOT NULL,
    confirmed_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ix_operator_confirmations_refers_to ON operator_confirmations(refers_to_kind, refers_to_id);
Indexed and embedded into the trigraph as (:Confirmation)-[:CONFIRMS]->(:Entity) edges so semantic_recall surfaces them via the SPEC-059 graph leg.

§3.2 — Mandatory operator_visible_effect on UX-surface deltas (Gap 1 fix)

Add an optional but strongly-encouraged field to session_deltas:
ALTER TABLE session_deltas ADD COLUMN operator_visible_effect TEXT;
Schema-level guard: agents writing wraps that touch the surface allowlist must populate this field. Allowlist:
  • bin/coder.{sh,ps1}, bin/statusline-claude-code.sh
  • cli/src/statusline.ts, cli/src/index.ts editor MCP block writes
  • mcp-node/src/strategies/, mcp-node/src/surfaces/, mcp-node/src/bootstrap/channel_bridge.ts
  • mcp-node/src/signalCache.ts, dashboard/web/
  • backend prism_start / prism_status / prism_whois rendering paths
If the wrap touches any allowlisted file AND operator_visible_effect is empty, prism_wrap returns a structured warning naming the gap. Non-blocking.

§3.3 — Integration-narrative delta class (Gap 3 fix)

Add a new delta_kind enum value: 'integration_narrative'. Distinct from 'wrap' | 'checkpoint' | 'bootstrap_only' | 'other'. When multiple per-piece commits or wraps combine into a user-facing capability, the agent (or the next agent who notices) writes an integration_narrative delta whose body answers three questions in plain operator language:
  1. What does the operator see now that they didn’t before?
  2. What pieces had to land together to produce that? (list of delta IDs / commit SHAs / spec IDs)
  3. What would break it? (cross-cutting invariants that future ports/refactors must preserve)

§3.4 — Negative-findings discipline (Gap 4 fix)

Extend the existing wrap structure with a structured tried_and_ruled_out JSONB array:
[
  {
    "approach": "title-watcher daemon to render persona+project in terminal title",
    "outcome": "visually clobbers richer initial title every 1s",
    "ruled_out_in_delta": "fd4d1895"
  }
]
Queryable, embeddable, surfaceable. Plots dead-end frequency on the dashboard.

§3.5 — Dashboard panel: confirmation rate

Adds one panel to the SPEC-059 §3.4 dashboard set:
PanelSourceQuestion it answers
Operator confirmation rateoperator_confirmations over time, grouped by verdict and confirmed_viaAre we shipping things the operator validates as working, or as broken?

§3.6 — Operator-input capture (Gap 5 fix, NEW in v0.2)

The mechanism that captures Frank’s corrections, confirmations, and direction-pivots without requiring him to type a verb.

Three prompt classes (closed enum)

ClassDefinitionExamples from this session
correctionOperator stops or redirects the agent’s current trajectory”stop, you keep doing the same error”, “you continue to take the easy way out”, “your bashing again and going after git”
confirmationOperator affirms agent output as correct, working, or matching intent”lets GO!!!!”, “perfect”, “yes commit”
direction_pivotOperator narrows or shifts the active task scope without rejecting prior work”wait, that’s the title bar — we’re working on the Prompt”, “yes please send”, “fold into SPEC”
Anything that doesn’t cleanly fit one of these three is not captured. This is intentional — routine acks (“ok”, “next”), conversational chatter, and information-dense substantive direction are out of scope. The three classes exist because they each map to a different downstream signal: corrections feed agent-improvement training data, confirmations feed prism_vs_native quality metrics, pivots feed retrospective task-scope analysis.

Capture mechanism — agent-driven auto-tag

The agent in conversation with Frank fires the new verb at the end of any turn where the prompt was state-changing:
prism_operator_input(
  pid="PID-PGR01",
  class="correction" | "confirmation" | "direction_pivot",
  prompt_text="<verbatim Frank typed>",
  triggered_action="<one line: what the agent was doing or about to do>",
  reverses_delta="<delta_id if a correction reverses a specific prior delta>",
  confidence="high" | "low"
)
confidence='high' when the prompt’s class is unambiguous from the language (“stop” / “perfect” / “wait, wrong layer”). confidence='low' when the agent is interpreting tone or context to assign the class — flag for operator review. The agent does not write the row at all if the prompt is routine (single-token responses, “ok”, “next”) — only for prompts that visibly change the agent’s trajectory.

Schema

CREATE TABLE operator_inputs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    project_id      UUID NOT NULL REFERENCES projects(id),
    captured_via    TEXT NOT NULL,             -- agent persona that captured
    captured_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    class           TEXT NOT NULL CHECK (class IN ('correction','confirmation','direction_pivot')),
    prompt_text     TEXT NOT NULL,
    triggered_action TEXT,
    reverses_delta  UUID REFERENCES session_deltas(id),
    confidence      TEXT NOT NULL CHECK (confidence IN ('high','low')),
    operator_review TEXT CHECK (operator_review IN ('accepted','rejected') OR operator_review IS NULL),
    reviewed_at     TIMESTAMPTZ
);
CREATE INDEX ix_operator_inputs_class_created ON operator_inputs(class, captured_at DESC);
CREATE INDEX ix_operator_inputs_unreviewed ON operator_inputs(captured_at DESC) WHERE operator_review IS NULL;
Trigraph projection: (:OperatorInput {class})-[:LABELS]->(:Entity) edges from the input row to the delta or signal it triggered/reversed. SPEC-059’s graph leg surfaces them automatically when adjacent entities are queried.

Operator review loop

Dashboard panel (extends the SPEC-060 §3.5 set):
PanelSourceAction
Unreviewed operator inputsoperator_inputs WHERE operator_review IS NULL ordered by captured_at descOne-click accept/reject; rejected ones tag the agent that captured them as a misclassification source
Misclassification rate by capturing agentoperator_review='rejected' rate per captured_viaSurfaces which agent surfaces over-tag or mis-tag; informs which agents need refinement
Correction frequency by triggered_actionclass='correction' grouped by triggered_action substring patternsIdentifies repeating agent failure modes — Frank’s most-stopped trajectories
Frank’s review burden: skim weekly, accept/reject in batches. Misclassifications become anti-examples agents can read on next bootstrap.

What this solves

  • Training-grade labels without operator friction at capture time.
  • Repeated-mistake detection (e.g., this session’s “Donna kept going git-first; Frank corrected 5 times” pattern would be a single SQL query in the dashboard).
  • prism_vs_native data — confirmation count vs correction count per agent surface is the cleanest single quality metric we’d have.

Invariants

  • Routine acks NEVER captured. Agent’s job to filter at write time.
  • Operator can manually fire the verb if they want — useful for retroactive labels (“that thing yesterday — that was a correction, btw”).
  • confidence='low' rows always require review before counting toward agent quality metrics.
  • Rejected misclassifications are NOT deleted — they become part of the training corpus showing what NOT to tag.

Files changed

FileChangeWhy
backend/alembic/versions/027_spec060_operator_capture.pyNewoperator_confirmations + operator_inputs tables, session_deltas.operator_visible_effect, delta_kind enum extension, tried_and_ruled_out JSONB
backend/app/models/operator_confirmation.pyNewORM model
backend/app/models/operator_input.pyNewORM model
backend/app/services/operator_capture_service.pyNewCombined service for confirmations + inputs; trigraph projection hooks
backend/app/routers/operator.pyNewPOST /api/v1/{pid}/confirm + POST /api/v1/{pid}/operator_input
mcp-node/src/verbs/coordination.tsModifiedNew prism_confirm + prism_operator_input verbs
backend/app/services/spec_service.py etc.ModifiedWrap warning emitter for missing operator_visible_effect on surface-allowlist wraps
dashboard/web/src/pages/MemoryPage.tsxModifiedAdd confirmation-rate + unreviewed-inputs + misclassification-rate + correction-frequency panels
tests/test_spec060_confirmations.pyNewVerb persists row; trigraph edge written
tests/test_spec060_inputs.pyNewAuto-tag verb persists row; review loop works; trigraph edge written
tests/test_spec060_visible_effect_warning.pyNewAllowlist wrap with empty effect emits warning
No SPEC-058 or SPEC-059 collision. Builds on SPEC-059’s graph + temporal infra.

Acceptance criteria

  1. prism_confirm callable; row persists; semantic_recall returns confirmation when target is queried.
  2. prism_operator_input callable; row persists; trigraph edge written; review loop accept/reject works.
  3. Wrap touching bin/coder.sh (or any allowlisted file) with empty operator_visible_effect emits a non-blocking warning.
  4. At least 5 historical “working state” moments retroactively captured via prism_confirm in the first week post-deploy.
  5. The tried_and_ruled_out field is populated on all wraps in the first month touching launcher or signal-pipeline surfaces.
  6. Dashboard shows non-zero confirmation rate, unreviewed-inputs queue, and at least one captured correction within 24h of merge.
  7. Headline test: semantic_recall("Codex inline statusline working state Frank confirmed") returns at least one confirmation row in top-5, where today the same query returns engineering-vocabulary deltas only.
  8. Headline test §3.6: semantic_recall("Frank corrected Donna for going to git instead of memory") returns the relevant operator_input rows with class=‘correction’. Today this query returns nothing because the labels don’t exist.

Phased rollout

Phase 1 — prism_confirm + prism_operator_input verbs + tables. Lowest risk; new mechanisms, no existing code changes. Frank can start using them the day they land. Auto-capture starts producing data immediately as agents fire prism_operator_input on state-change prompts. Phase 2 — operator_visible_effect field + surface-allowlist warning. Touches every wrap path; warning is non-blocking so risk is low. Phase 3 — Integration-narrative kind + negative-findings discipline + dashboard panels. Builds on Phase 1+2 data. The unreviewed-inputs queue and misclassification dashboard land in this phase.

Out of scope

  • Backfilling historical deltas with operator_visible_effect or retroactive operator_inputs.
  • Auto-generating operator_visible_effect via LLM. Keep it human-authored to preserve fidelity.
  • Multi-operator support (multiple Franks). Single-operator assumption is fine for v1.
  • Verbatim conversation transcript capture. State-change prompts only.

Lessons learned (this session, embedded as Frank’s standing convention)

  1. Engineering vocabulary doesn’t retrieve by visual outcome. The mechanism Texi shipped was in memory — but searchable only via engineering terms (turn/steer, app_server_inject). Operator-visible effect captured at write time would have ranked it #1 on query #1.
  2. The operator’s affirmation IS the test result. No CI test, no smoke output, no internal verification matters as much as “Frank looked at it and said it works.” That datum is the most valuable single piece of memory the system can hold.
  3. Memory hygiene is a writer-side problem, not a reader-side problem. I kept “improving” my queries. The actual fix is improving what gets written.
  4. Operator corrections are training data the system already produces but throws away (NEW in v0.2). Five+ corrections this session, zero captured. The agent has full context at the moment the correction lands; the cost of prism_operator_input is one tool call per state-change prompt; the value is a permanent, queryable, training-grade label corpus.

References

  • Specs: SPEC-020 (tri-graph), SPEC-052 (per-identity signal cache), SPEC-057 (prism_start opening), SPEC-058 (signal delivery SSoT), SPEC-059 (tri-graph activation + memory tokenomics).
  • Triggering deltas: 2901d7ce (the working turn/steer + turn/start mechanism, accurate but UX-blind), 8b148959 (Texi’s reverse-engineering this week), 1e030e17 (regression diagnosis), journal #7 (terminology cleanup today).
  • Memories filed this session:
    • feedback_memory_first_for_historical — query memory before git/SQL
    • feedback_use_existing_verbs_not_manual — use verbs, not manual lane
    • feedback_quoting_across_shell_layers — shell-layer quoting hygiene
    • SPEC-060 is the structural follow-up that addresses the writer-side cause of the same retrieval failures these feedback memories tried to fix on the reader side.

Authorship

Donna (Claude Code, mini3, session 3f36b796). 2026-04-30. v0.1 filed at Frank’s direction immediately after the dogfood-failure diagnosis. v0.2 adds §3.6 per Frank’s “fold into SPEC” directive after he asked how labels are created — the operator-driven label capture is a critical part of writer-side memory hygiene that v0.1 didn’t address.
Last modified on May 3, 2026