Skip to main content

SPEC-099 v0.1 — Dewey (Recallgate)

Status: draft — Donna engineering proposal; routed to Texi for architecture review. Author: Donna Reviewer (architecture): Texi Reviewer (governance): Candi Final operator approver: Frank Origin: Plan 003 Phase 3 Step 3.1 (signal 94ccc816 Candi-relayed Frank go-signal 8ac87b0d).

Summary

Recall enforcement gate. Element 5 of method.wall-break.persistence (“recall-on-task-start”) operationalized as a runtime check: at task-start of a non-trivial assignment, the agent is required to invoke semantic_recall against TriGraph for prior wall-breaks / similar work / known answers before producing material output. Phasing pattern verbatim from SPEC-096 v0.2: mode trinary off | advisory | enforce, default advisory; Tier-1 / Tier-2 watched task-classes; force-override audited; lifecycle invariant — refusal keeps session active. Where Recallgate semantics diverge from SPEC-096 (different trigger surface, different evidence model), the divergence is called out below.

Why this exists

Plan 003 Phase 3. The activation-vs-substrate gap that drove SPEC-094 v0.3 + SPEC-096 v0.2 + method.wall-break.persistence reappears at the task-start boundary: even when behavioral memory exists in TriGraph (substrate), agents accept tasks without consulting it. Recallgate makes recall a precondition for accepting non-trivial work.

Watched task-classes

Tier 1 — eligible for enforce mode:
  • Spec implementation work (touching files in backend/app/services/ linked to a SPEC entity)
  • BIOS edits (CLAUDE.md, AGENTS.md, templates/CLAUDE.md, templates/AGENTS.md)
  • Governance changes (rules, ADRs, method-fragments, anything under docs/adrs/, docs/specs/, docs/method-fragments/)
  • Cross-lane work (assignment_routing changes, persona-binding changes)
Tier 2 — advisory only even in enforce mode:
  • Docs surface tweaks (Mintlify nav, copy edits)
  • Dashboard / UI work that doesn’t touch governance state
  • Telemetry surface changes
Deferred to v0.2:
  • Per-specialization watched lists (e.g. install lane has different Tier-1 than engineering)
  • Per-surface lists (Codex vs Claude Code task-classes may differ)

Trigger semantics

The check fires when ALL of:
  • (a) the agent is accepting a new assignment (signal TaskAssigned, operator chat assignment, or self-initiated prism_todo add) AND
  • (b) the assignment matches a Tier-1 or Tier-2 task-class AND
  • (c) the session has NOT issued a semantic_recall call within the recent window scoped to the current assignment context.
“Recent window” v0.1: 60 seconds before the task-acceptance moment. Sliding window keyed on session_id; reset when assignment changes. “Scoped to the current assignment context” v0.1: any semantic_recall invocation since the assignment-acceptance event, regardless of query string. v0.2 may tighten to require a query that overlaps with the assignment’s surface / specialization.

Behavior — by mode

Env flag: PRISM_RECALLGATE_MODE = off | advisory | enforce. Default: advisory.

off

Check does not run. Reserved for emergency disable.

advisory (v0.1 default)

The assignment-acceptance proceeds. The verb returns a warnings: [...] list:
{
  "ok": true,
  "warnings": [
    {
      "kind": "missing_recall_on_task_start",
      "task_class": "tier1",
      "assignment_id": "...",
      "remediation": "Issue semantic_recall against TriGraph for prior work on this task-class before producing material output."
    }
  ]
}

enforce (v0.2-eligible)

For Tier-1 task-classes ONLY, the verb refuses with:
{
  "ok": false,
  "error": "missing_recall_on_task_start",
  "stage": "recallgate_preflight",
  "task_class": "tier1",
  "assignment_id": "...",
  "remediation": "Issue semantic_recall before re-attempting task acceptance, or pass force=true with reason."
}
Lifecycle invariant (mirrors SPEC-096 §Behavior): Recallgate refusal does NOT call deregister / heartbeat-stop / stream-shutdown logic. The session stays fully active. The agent invokes semantic_recall, then retries assignment-acceptance. Tier-2 task-classes stay advisory in enforce mode.

Force-override (audited)

force=true + force_reason (≥10 chars) lets the agent or operator bypass the gate. Emits a recallgate_force audit event in TriGraph attributable to session_id. Audit-event fields:
  • session_id, agent_identity, force_reason
  • assignment_id, task_class
  • would_have_required_recall: bool
This is the Watson surface for Recallgate: every force-override is auditable proof that the gate was bypassed and why.

Watson hook on every recall

Beyond force-override events, every semantic_recall invocation by an agent in a watched task-class emits a recall_invoked audit event:
  • session_id, agent_identity
  • assignment_id (current active assignment, if any)
  • query, source_types, top_k (request shape)
  • results_returned (count, not bodies)
  • invoked_at
This builds the dataset Watson SPEC will consume for proof-of-recall surfaces. v0.1 emits; consumption is deferred to Watson.

Backend

New helper in backend/app/services/recallgate.py:
def detect_missing_recall_at_task_start(
    *,
    session_id: UUID,
    assignment_id: UUID,
    task_class: TaskClass,
    recent_recall_invocations: list[RecallInvocation],
    mode: RecallgateMode,
) -> RecallgateOutcome: ...
Pure function over inputs; no I/O. Called from the assignment-acceptance handler (location TBD — likely an extension to prism_signal_ack for TaskAssigned ACKs and to prism_todo add).

Phasing

  • v0.1 (this SPEC): Phase 0 advisory mode behind PRISM_RECALLGATE_MODE=advisory default. Telemetry: recallgate_warning_count per session, recall_invoked audit events flowing.
  • v0.2: flip default to enforce for Tier-1 task-classes after — (a) 7-day advisory observation period, (b) Texi + Candi review of telemetry, (c) test coverage in place. No automatic flip on zero fires (mirrors SPEC-096 nit 3).
  • v0.3: per-specialization / per-surface task-class lists.

Tests (mandatory before flag flip)

Backend unit tests in backend/tests/test_recallgate.py:
  • Task-class matching (Tier 1 / Tier 2 / unmatched)
  • Mode behavior (off / advisory warns / enforce-Tier1 refuses / enforce-Tier2 warns)
  • Recent-window logic (recall just before task-accept clears the gate)
  • Force-override audit event emission (required force_reason ≥10 chars)
  • Lifecycle invariant — enforced refusal does NOT trigger session shutdown calls
  • recall_invoked audit event shape on every semantic_recall in a watched class
Integration test for assignment-acceptance flow:
  • Tier-1 task-acceptance without prior recall → advisory warning surfaces
  • Same flow in enforce mode → refusal returned, session still active
  • Force-override path → success + audit event written

Out of scope (v0.1)

  • Recall quality assessment. v0.1 only checks recall happened, not whether the query was useful. Quality scoring is downstream.
  • Cross-session recall coverage. “Donna already ran this recall yesterday in another session” doesn’t satisfy v0.1’s window — v0.1 is per-session.
  • Auto-recall. Agents must explicitly invoke semantic_recall; the gate does not run recall on their behalf.
  • Memorule-specific recall enforcement. v0.1 accepts any semantic_recall; SPEC-094 v0.4 (Memorules) interaction comes in v0.2 once Memorules surface lands.

Cross-references

  • method.wall-break.persistence v0.2 — Element 5 (recall-on-task-start) is the methodology this SPEC operationalizes
  • SPEC-096 v0.2 — phasing pattern, mode-trinary, Tier-1/Tier-2 split, force-override semantics, lifecycle invariant — all mirrored here
  • SPEC-094 v0.3 + v0.4 — substrate + Memorules; Recallgate v0.2 will integrate Memorule-aware recall
  • (future) Watson SPEC — consumes recall_invoked and recallgate_force audit events
  • Plan 003 Phase 3 — this SPEC’s home phase

Risk

  • Adoption: R2 — methodology change affecting how every Tier-1 task-acceptance proceeds. Backward-compat preserved via advisory default + force-override.
  • Highest-risk surface: the trigger surface (where assignment-acceptance is detected). Mis-firing on assignments that are NOT real task-starts (e.g. routine prism_todo adds for housekeeping) creates noise. Tier-1 watched-task list must err narrow at v0.1.

Acceptance criteria for v0.1 → v0.2 promotion

  1. Phase 0 advisory shipped + observed for at least 7 days of active multi-agent sessions.
  2. Telemetry: recallgate_warning_count non-zero AND recall_invoked audit events flowing in observable volume.
  3. Texi architecture review of v0.1 → v0.2 deltas.
  4. Candi governance ratification of v0.2.
  5. Backend test coverage per §Tests.
  6. Lifecycle invariant test passes (no session-shutdown calls on enforced refusal).
Last modified on June 7, 2026