Status: draft · Version v0.2 · Filed 2026-05-01
spec_id: SPEC-060
version: v0.2
status: draft
authored_by: Donna
date: 2026-04-30
SPEC-060 — Operator-Experience Memory Capture
Status
Draft v0.2 — for Frank’s review. v0.2 adds §3.6 (operator-input prompt capture) per Frank’s directive after v0.1 review.
Authored Donna 2026-04-30 after a memory-investigation failure: spent ~30 minutes across multiple wrong queries trying to find what Texi shipped last week that produced the working Codex inline-statusline visual Frank confirmed. The mechanism (turn/steer + turn/start dual-path in app_server_inject.py, delta 2901d7ce) WAS in memory, but the operator-visible effect and Frank’s confirmation moment were not. Without those, the engineering breadcrumbs alone weren’t queryable for “what worked that I should restore.”
Origin
Frank asked: “Are we writing enough to memory to be useful?” The honest answer found via dogfooding semantic_recall this session: we’re writing a lot, but not the right things. Engineering layer is well-captured (file paths, verbs, smokes). Operator-experience layer is systematically missing. This spec closes five specific gaps that bit us repeatedly today.
The five gaps
Gap 1 — Operator-visible effects are under-documented
Delta 2901d7ce describes the working state as: “signals surface visibly in Codex through thread state aware delivery: turn/steer when a turn is active, turn/start when idle or non-steerable.”
Accurate engineering description. Useless for retrieval by visual outcome. The next agent (me, today) searching for “Codex inline statusline persona project visible above input bar” never found this delta because none of those operator-facing words are in the text.
Gap 2 — Operator confirmation moments aren’t captured
Frank’s “I told her it finally worked” landed in a Codex conversation buffer that rolled minutes later. No durable record. The only “Frank confirmed render” line in memory across the entire corpus is in delta 71e2c369 (Lafonda’s Mac Claude Code wrap), and it exists only because Lafonda happened to type those three words in her own wrap free-form.
Gap 3 — Cross-cutting “what produced the visible result” narratives are absent
Each delta records a piece. No delta says “the visible inline render in Codex requires THESE THREE pieces together: (a) env forwarding via mcp_servers.prism.env.PRISM_CODEX_APP_SERVER_*, (b) turn/steer for active-turn injection, (c) turn/start for idle injection.” The architectural causation isn’t captured.
Gap 4 — Negative findings / dead ends are mostly absent
We capture what shipped. We rarely capture what was tried and ruled out. Texi spent time on title-watcher hypotheses today — work that would have been short-circuited if a prior delta had said “the title-watcher path was tried; we dropped it in favor of in-thread injection.”
Frank’s prompts during a session contain the most authoritative labels in the system: “stop, you keep doing X again” / “perfect, that worked” / “wait, wrong layer”. These are training-grade signals — direct ground truth on whether agent behavior is correct. Today they live only in conversation buffers and roll off as soon as the conversation does. Five+ corrections this session alone; zero captured.
What these gaps share
The missing things are all operator-facing context, not engineering detail. The engineering layer is well-captured because the agents writing wraps are themselves engineers; that’s their native vocabulary. The UX/operator-experience layer is under-captured because nobody’s job is to translate engineering changes into operator-visible terms — and operator-input labels are under-captured because the operator (Frank) won’t reliably type a verb mid-conversation to label his own correction.
Goals
- Operator-visible effects are first-class memory. Every session_delta whose work touches a surface visible to the operator carries an explicit
operator_visible_effect description in operator-facing language.
- Operator confirmations are durable. A dedicated mechanism captures Frank’s verdicts as structured, queryable records.
- Integration narratives have a home. A delta-class for “this combo of changes produces this visible result” exists.
- Negative findings are durable. Every wrap has a “what was tried and ruled out” section.
- Operator inputs (corrections / confirmations / pivots) are auto-captured. Agent-driven tag at capture time; operator review batched in dashboard.
Non-goals
- Forcing operator-language on engineering-only deltas (backend refactor with no UX impact). Scoped to surface-touching work.
- Rewriting historical deltas. SPEC-060 is forward-looking; backfill is out of scope.
- Capturing every prompt verbatim. Routine acks, single-token responses, and conversational chatter are explicitly skipped — only state-change prompts.
- Replacing free-form summaries. The new fields supplement, they don’t supplant.
Architecture
§3.1 — prism_confirm verb (Gap 2 fix)
Operator-driven verb for capturing affirmations or rejections. Operator typed in any agent’s terminal:
prism_confirm(
pid="PID-PGR01",
refers_to="delta:2901d7ce" | "signal:<id>" | "spec:SPEC-XYZ" | "commit:<sha>",
verdict="works" | "broken" | "partial",
notes="inline note appears above input bar in Codex TUI showing persona+project; rendered when turn/steer fires during active turn"
)
Persists to a new operator_confirmations table:
CREATE TABLE operator_confirmations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
project_id UUID NOT NULL REFERENCES projects(id),
refers_to_kind TEXT NOT NULL,
refers_to_id TEXT NOT NULL,
verdict TEXT NOT NULL,
notes TEXT,
confirmed_by TEXT NOT NULL,
confirmed_via TEXT NOT NULL,
confirmed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ix_operator_confirmations_refers_to ON operator_confirmations(refers_to_kind, refers_to_id);
Indexed and embedded into the trigraph as (:Confirmation)-[:CONFIRMS]->(:Entity) edges so semantic_recall surfaces them via the SPEC-059 graph leg.
§3.2 — Mandatory operator_visible_effect on UX-surface deltas (Gap 1 fix)
Add an optional but strongly-encouraged field to session_deltas:
ALTER TABLE session_deltas ADD COLUMN operator_visible_effect TEXT;
Schema-level guard: agents writing wraps that touch the surface allowlist must populate this field. Allowlist:
bin/coder.{sh,ps1}, bin/statusline-claude-code.sh
cli/src/statusline.ts, cli/src/index.ts editor MCP block writes
mcp-node/src/strategies/, mcp-node/src/surfaces/, mcp-node/src/bootstrap/channel_bridge.ts
mcp-node/src/signalCache.ts, dashboard/web/
- backend
prism_start / prism_status / prism_whois rendering paths
If the wrap touches any allowlisted file AND operator_visible_effect is empty, prism_wrap returns a structured warning naming the gap. Non-blocking.
§3.3 — Integration-narrative delta class (Gap 3 fix)
Add a new delta_kind enum value: 'integration_narrative'. Distinct from 'wrap' | 'checkpoint' | 'bootstrap_only' | 'other'.
When multiple per-piece commits or wraps combine into a user-facing capability, the agent (or the next agent who notices) writes an integration_narrative delta whose body answers three questions in plain operator language:
- What does the operator see now that they didn’t before?
- What pieces had to land together to produce that? (list of delta IDs / commit SHAs / spec IDs)
- What would break it? (cross-cutting invariants that future ports/refactors must preserve)
§3.4 — Negative-findings discipline (Gap 4 fix)
Extend the existing wrap structure with a structured tried_and_ruled_out JSONB array:
[
{
"approach": "title-watcher daemon to render persona+project in terminal title",
"outcome": "visually clobbers richer initial title every 1s",
"ruled_out_in_delta": "fd4d1895"
}
]
Queryable, embeddable, surfaceable. Plots dead-end frequency on the dashboard.
§3.5 — Dashboard panel: confirmation rate
Adds one panel to the SPEC-059 §3.4 dashboard set:
| Panel | Source | Question it answers |
|---|
| Operator confirmation rate | operator_confirmations over time, grouped by verdict and confirmed_via | Are we shipping things the operator validates as working, or as broken? |
The mechanism that captures Frank’s corrections, confirmations, and direction-pivots without requiring him to type a verb.
Three prompt classes (closed enum)
| Class | Definition | Examples from this session |
|---|
correction | Operator stops or redirects the agent’s current trajectory | ”stop, you keep doing the same error”, “you continue to take the easy way out”, “your bashing again and going after git” |
confirmation | Operator affirms agent output as correct, working, or matching intent | ”lets GO!!!!”, “perfect”, “yes commit” |
direction_pivot | Operator narrows or shifts the active task scope without rejecting prior work | ”wait, that’s the title bar — we’re working on the Prompt”, “yes please send”, “fold into SPEC” |
Anything that doesn’t cleanly fit one of these three is not captured. This is intentional — routine acks (“ok”, “next”), conversational chatter, and information-dense substantive direction are out of scope. The three classes exist because they each map to a different downstream signal: corrections feed agent-improvement training data, confirmations feed prism_vs_native quality metrics, pivots feed retrospective task-scope analysis.
Capture mechanism — agent-driven auto-tag
The agent in conversation with Frank fires the new verb at the end of any turn where the prompt was state-changing:
prism_operator_input(
pid="PID-PGR01",
class="correction" | "confirmation" | "direction_pivot",
prompt_text="<verbatim Frank typed>",
triggered_action="<one line: what the agent was doing or about to do>",
reverses_delta="<delta_id if a correction reverses a specific prior delta>",
confidence="high" | "low"
)
confidence='high' when the prompt’s class is unambiguous from the language (“stop” / “perfect” / “wait, wrong layer”). confidence='low' when the agent is interpreting tone or context to assign the class — flag for operator review. The agent does not write the row at all if the prompt is routine (single-token responses, “ok”, “next”) — only for prompts that visibly change the agent’s trajectory.
Schema
CREATE TABLE operator_inputs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
project_id UUID NOT NULL REFERENCES projects(id),
captured_via TEXT NOT NULL, -- agent persona that captured
captured_at TIMESTAMPTZ NOT NULL DEFAULT now(),
class TEXT NOT NULL CHECK (class IN ('correction','confirmation','direction_pivot')),
prompt_text TEXT NOT NULL,
triggered_action TEXT,
reverses_delta UUID REFERENCES session_deltas(id),
confidence TEXT NOT NULL CHECK (confidence IN ('high','low')),
operator_review TEXT CHECK (operator_review IN ('accepted','rejected') OR operator_review IS NULL),
reviewed_at TIMESTAMPTZ
);
CREATE INDEX ix_operator_inputs_class_created ON operator_inputs(class, captured_at DESC);
CREATE INDEX ix_operator_inputs_unreviewed ON operator_inputs(captured_at DESC) WHERE operator_review IS NULL;
Trigraph projection: (:OperatorInput {class})-[:LABELS]->(:Entity) edges from the input row to the delta or signal it triggered/reversed. SPEC-059’s graph leg surfaces them automatically when adjacent entities are queried.
Operator review loop
Dashboard panel (extends the SPEC-060 §3.5 set):
| Panel | Source | Action |
|---|
| Unreviewed operator inputs | operator_inputs WHERE operator_review IS NULL ordered by captured_at desc | One-click accept/reject; rejected ones tag the agent that captured them as a misclassification source |
| Misclassification rate by capturing agent | operator_review='rejected' rate per captured_via | Surfaces which agent surfaces over-tag or mis-tag; informs which agents need refinement |
| Correction frequency by triggered_action | class='correction' grouped by triggered_action substring patterns | Identifies repeating agent failure modes — Frank’s most-stopped trajectories |
Frank’s review burden: skim weekly, accept/reject in batches. Misclassifications become anti-examples agents can read on next bootstrap.
What this solves
- Training-grade labels without operator friction at capture time.
- Repeated-mistake detection (e.g., this session’s “Donna kept going git-first; Frank corrected 5 times” pattern would be a single SQL query in the dashboard).
- prism_vs_native data — confirmation count vs correction count per agent surface is the cleanest single quality metric we’d have.
Invariants
- Routine acks NEVER captured. Agent’s job to filter at write time.
- Operator can manually fire the verb if they want — useful for retroactive labels (“that thing yesterday — that was a correction, btw”).
confidence='low' rows always require review before counting toward agent quality metrics.
- Rejected misclassifications are NOT deleted — they become part of the training corpus showing what NOT to tag.
Files changed
| File | Change | Why |
|---|
backend/alembic/versions/027_spec060_operator_capture.py | New | operator_confirmations + operator_inputs tables, session_deltas.operator_visible_effect, delta_kind enum extension, tried_and_ruled_out JSONB |
backend/app/models/operator_confirmation.py | New | ORM model |
backend/app/models/operator_input.py | New | ORM model |
backend/app/services/operator_capture_service.py | New | Combined service for confirmations + inputs; trigraph projection hooks |
backend/app/routers/operator.py | New | POST /api/v1/{pid}/confirm + POST /api/v1/{pid}/operator_input |
mcp-node/src/verbs/coordination.ts | Modified | New prism_confirm + prism_operator_input verbs |
backend/app/services/spec_service.py etc. | Modified | Wrap warning emitter for missing operator_visible_effect on surface-allowlist wraps |
dashboard/web/src/pages/MemoryPage.tsx | Modified | Add confirmation-rate + unreviewed-inputs + misclassification-rate + correction-frequency panels |
tests/test_spec060_confirmations.py | New | Verb persists row; trigraph edge written |
tests/test_spec060_inputs.py | New | Auto-tag verb persists row; review loop works; trigraph edge written |
tests/test_spec060_visible_effect_warning.py | New | Allowlist wrap with empty effect emits warning |
No SPEC-058 or SPEC-059 collision. Builds on SPEC-059’s graph + temporal infra.
Acceptance criteria
prism_confirm callable; row persists; semantic_recall returns confirmation when target is queried.
prism_operator_input callable; row persists; trigraph edge written; review loop accept/reject works.
- Wrap touching
bin/coder.sh (or any allowlisted file) with empty operator_visible_effect emits a non-blocking warning.
- At least 5 historical “working state” moments retroactively captured via
prism_confirm in the first week post-deploy.
- The
tried_and_ruled_out field is populated on all wraps in the first month touching launcher or signal-pipeline surfaces.
- Dashboard shows non-zero confirmation rate, unreviewed-inputs queue, and at least one captured correction within 24h of merge.
- Headline test:
semantic_recall("Codex inline statusline working state Frank confirmed") returns at least one confirmation row in top-5, where today the same query returns engineering-vocabulary deltas only.
- Headline test §3.6:
semantic_recall("Frank corrected Donna for going to git instead of memory") returns the relevant operator_input rows with class=‘correction’. Today this query returns nothing because the labels don’t exist.
Phased rollout
Phase 1 — prism_confirm + prism_operator_input verbs + tables. Lowest risk; new mechanisms, no existing code changes. Frank can start using them the day they land. Auto-capture starts producing data immediately as agents fire prism_operator_input on state-change prompts.
Phase 2 — operator_visible_effect field + surface-allowlist warning. Touches every wrap path; warning is non-blocking so risk is low.
Phase 3 — Integration-narrative kind + negative-findings discipline + dashboard panels. Builds on Phase 1+2 data. The unreviewed-inputs queue and misclassification dashboard land in this phase.
Out of scope
- Backfilling historical deltas with
operator_visible_effect or retroactive operator_inputs.
- Auto-generating
operator_visible_effect via LLM. Keep it human-authored to preserve fidelity.
- Multi-operator support (multiple Franks). Single-operator assumption is fine for v1.
- Verbatim conversation transcript capture. State-change prompts only.
Lessons learned (this session, embedded as Frank’s standing convention)
-
Engineering vocabulary doesn’t retrieve by visual outcome. The mechanism Texi shipped was in memory — but searchable only via engineering terms (
turn/steer, app_server_inject). Operator-visible effect captured at write time would have ranked it #1 on query #1.
-
The operator’s affirmation IS the test result. No CI test, no smoke output, no internal verification matters as much as “Frank looked at it and said it works.” That datum is the most valuable single piece of memory the system can hold.
-
Memory hygiene is a writer-side problem, not a reader-side problem. I kept “improving” my queries. The actual fix is improving what gets written.
-
Operator corrections are training data the system already produces but throws away (NEW in v0.2). Five+ corrections this session, zero captured. The agent has full context at the moment the correction lands; the cost of
prism_operator_input is one tool call per state-change prompt; the value is a permanent, queryable, training-grade label corpus.
References
- Specs: SPEC-020 (tri-graph), SPEC-052 (per-identity signal cache), SPEC-057 (prism_start opening), SPEC-058 (signal delivery SSoT), SPEC-059 (tri-graph activation + memory tokenomics).
- Triggering deltas:
2901d7ce (the working turn/steer + turn/start mechanism, accurate but UX-blind), 8b148959 (Texi’s reverse-engineering this week), 1e030e17 (regression diagnosis), journal #7 (terminology cleanup today).
- Memories filed this session:
feedback_memory_first_for_historical — query memory before git/SQL
feedback_use_existing_verbs_not_manual — use verbs, not manual lane
feedback_quoting_across_shell_layers — shell-layer quoting hygiene
- SPEC-060 is the structural follow-up that addresses the writer-side cause of the same retrieval failures these feedback memories tried to fix on the reader side.
Authorship
Donna (Claude Code, mini3, session 3f36b796). 2026-04-30. v0.1 filed at Frank’s direction immediately after the dogfood-failure diagnosis. v0.2 adds §3.6 per Frank’s “fold into SPEC” directive after he asked how labels are created — the operator-driven label capture is a critical part of writer-side memory hygiene that v0.1 didn’t address.