SPEC-099 v0.1 — Dewey (Recallgate)

Status: draft — Donna engineering proposal; routed to Texi for architecture review. Author: Donna Reviewer (architecture): Texi Reviewer (governance): Candi Final operator approver: Frank Origin: Plan 003 Phase 3 Step 3.1 (signal 94ccc816 Candi-relayed Frank go-signal 8ac87b0d).

Summary

Recall enforcement gate. Element 5 of method.wall-break.persistence (“recall-on-task-start”) operationalized as a runtime check: at task-start of a non-trivial assignment, the agent is required to invoke semantic_recall against TriGraph for prior wall-breaks / similar work / known answers before producing material output. Phasing pattern verbatim from SPEC-096 v0.2: mode trinary off | advisory | enforce, default advisory; Tier-1 / Tier-2 watched task-classes; force-override audited; lifecycle invariant — refusal keeps session active. Where Recallgate semantics diverge from SPEC-096 (different trigger surface, different evidence model), the divergence is called out below.

Why this exists

Plan 003 Phase 3. The activation-vs-substrate gap that drove SPEC-094 v0.3 + SPEC-096 v0.2 + method.wall-break.persistence reappears at the task-start boundary: even when behavioral memory exists in TriGraph (substrate), agents accept tasks without consulting it. Recallgate makes recall a precondition for accepting non-trivial work.

Watched task-classes

Tier 1 — eligible for enforce mode:

Spec implementation work (touching files in backend/app/services/ linked to a SPEC entity)
BIOS edits (CLAUDE.md, AGENTS.md, templates/CLAUDE.md, templates/AGENTS.md)
Governance changes (rules, ADRs, method-fragments, anything under docs/adrs/, docs/specs/, docs/method-fragments/)
Cross-lane work (assignment_routing changes, persona-binding changes)

Tier 2 — advisory only even in enforce mode:

Docs surface tweaks (Mintlify nav, copy edits)
Dashboard / UI work that doesn’t touch governance state
Telemetry surface changes

Deferred to v0.2:

Per-specialization watched lists (e.g. install lane has different Tier-1 than engineering)
Per-surface lists (Codex vs Claude Code task-classes may differ)

Trigger semantics

The check fires when ALL of:

(a) the agent is accepting a new assignment (signal TaskAssigned, operator chat assignment, or self-initiated prism_todo add) AND
(b) the assignment matches a Tier-1 or Tier-2 task-class AND
(c) the session has NOT issued a semantic_recall call within the recent window scoped to the current assignment context.

“Recent window” v0.1: 60 seconds before the task-acceptance moment. Sliding window keyed on session_id; reset when assignment changes. “Scoped to the current assignment context” v0.1: any semantic_recall invocation since the assignment-acceptance event, regardless of query string. v0.2 may tighten to require a query that overlaps with the assignment’s surface / specialization.

Behavior — by mode

Env flag: PRISM_RECALLGATE_MODE = off | advisory | enforce. Default: advisory.

off

Check does not run. Reserved for emergency disable.

advisory (v0.1 default)

The assignment-acceptance proceeds. The verb returns a warnings: [...] list:

{
  "ok": true,
  "warnings": [
    {
      "kind": "missing_recall_on_task_start",
      "task_class": "tier1",
      "assignment_id": "...",
      "remediation": "Issue semantic_recall against TriGraph for prior work on this task-class before producing material output."
    }
  ]
}

enforce (v0.2-eligible)

For Tier-1 task-classes ONLY, the verb refuses with:

{
  "ok": false,
  "error": "missing_recall_on_task_start",
  "stage": "recallgate_preflight",
  "task_class": "tier1",
  "assignment_id": "...",
  "remediation": "Issue semantic_recall before re-attempting task acceptance, or pass force=true with reason."
}

Lifecycle invariant (mirrors SPEC-096 §Behavior): Recallgate refusal does NOT call deregister / heartbeat-stop / stream-shutdown logic. The session stays fully active. The agent invokes semantic_recall, then retries assignment-acceptance. Tier-2 task-classes stay advisory in enforce mode.

Force-override (audited)

force=true + force_reason (≥10 chars) lets the agent or operator bypass the gate. Emits a recallgate_force audit event in TriGraph attributable to session_id. Audit-event fields:

session_id, agent_identity, force_reason
assignment_id, task_class
would_have_required_recall: bool

This is the Watson surface for Recallgate: every force-override is auditable proof that the gate was bypassed and why.

Watson hook on every recall

Beyond force-override events, every semantic_recall invocation by an agent in a watched task-class emits a recall_invoked audit event:

session_id, agent_identity
assignment_id (current active assignment, if any)
query, source_types, top_k (request shape)
results_returned (count, not bodies)
invoked_at

This builds the dataset Watson SPEC will consume for proof-of-recall surfaces. v0.1 emits; consumption is deferred to Watson.

Backend

New helper in backend/app/services/recallgate.py:

def detect_missing_recall_at_task_start(
    *,
    session_id: UUID,
    assignment_id: UUID,
    task_class: TaskClass,
    recent_recall_invocations: list[RecallInvocation],
    mode: RecallgateMode,
) -> RecallgateOutcome: ...

Pure function over inputs; no I/O. Called from the assignment-acceptance handler (location TBD — likely an extension to prism_signal_ack for TaskAssigned ACKs and to prism_todo add).

Phasing

v0.1 (this SPEC): Phase 0 advisory mode behind PRISM_RECALLGATE_MODE=advisory default. Telemetry: recallgate_warning_count per session, recall_invoked audit events flowing.
v0.2: flip default to enforce for Tier-1 task-classes after — (a) 7-day advisory observation period, (b) Texi + Candi review of telemetry, (c) test coverage in place. No automatic flip on zero fires (mirrors SPEC-096 nit 3).
v0.3: per-specialization / per-surface task-class lists.

Tests (mandatory before flag flip)

Backend unit tests in backend/tests/test_recallgate.py:

Task-class matching (Tier 1 / Tier 2 / unmatched)
Mode behavior (off / advisory warns / enforce-Tier1 refuses / enforce-Tier2 warns)
Recent-window logic (recall just before task-accept clears the gate)
Force-override audit event emission (required force_reason ≥10 chars)
Lifecycle invariant — enforced refusal does NOT trigger session shutdown calls
recall_invoked audit event shape on every semantic_recall in a watched class

Integration test for assignment-acceptance flow:

Tier-1 task-acceptance without prior recall → advisory warning surfaces
Same flow in enforce mode → refusal returned, session still active
Force-override path → success + audit event written

Out of scope (v0.1)

Recall quality assessment. v0.1 only checks recall happened, not whether the query was useful. Quality scoring is downstream.
Cross-session recall coverage. “Donna already ran this recall yesterday in another session” doesn’t satisfy v0.1’s window — v0.1 is per-session.
Auto-recall. Agents must explicitly invoke semantic_recall; the gate does not run recall on their behalf.
Memorule-specific recall enforcement. v0.1 accepts any semantic_recall; SPEC-094 v0.4 (Memorules) interaction comes in v0.2 once Memorules surface lands.

Cross-references

method.wall-break.persistence v0.2 — Element 5 (recall-on-task-start) is the methodology this SPEC operationalizes
SPEC-096 v0.2 — phasing pattern, mode-trinary, Tier-1/Tier-2 split, force-override semantics, lifecycle invariant — all mirrored here
SPEC-094 v0.3 + v0.4 — substrate + Memorules; Recallgate v0.2 will integrate Memorule-aware recall
(future) Watson SPEC — consumes recall_invoked and recallgate_force audit events
Plan 003 Phase 3 — this SPEC’s home phase

Risk

Adoption: R2 — methodology change affecting how every Tier-1 task-acceptance proceeds. Backward-compat preserved via advisory default + force-override.
Highest-risk surface: the trigger surface (where assignment-acceptance is detected). Mis-firing on assignments that are NOT real task-starts (e.g. routine prism_todo adds for housekeeping) creates noise. Tier-1 watched-task list must err narrow at v0.1.

Acceptance criteria for v0.1 → v0.2 promotion

Phase 0 advisory shipped + observed for at least 7 days of active multi-agent sessions.
Telemetry: recallgate_warning_count non-zero AND recall_invoked audit events flowing in observable volume.
Texi architecture review of v0.1 → v0.2 deltas.
Candi governance ratification of v0.2.
Backend test coverage per §Tests.
Lifecycle invariant test passes (no session-shutdown calls on enforced refusal).

​SPEC-099 v0.1 — Dewey (Recallgate)

​Summary

​Why this exists

​Watched task-classes

​Trigger semantics

​Behavior — by mode

​off

​advisory (v0.1 default)

​enforce (v0.2-eligible)

​Force-override (audited)

​Watson hook on every recall

​Backend

​Phasing

​Tests (mandatory before flag flip)

​Out of scope (v0.1)

​Cross-references

​Risk

​Acceptance criteria for v0.1 → v0.2 promotion