SPEC-099 v0.1 — Dewey (Recallgate)
Status: draft — Donna engineering proposal; routed to Texi for architecture review.
Author: Donna
Reviewer (architecture): Texi
Reviewer (governance): Candi
Final operator approver: Frank
Origin: Plan 003 Phase 3 Step 3.1 (signal 94ccc816 Candi-relayed Frank go-signal 8ac87b0d).
Summary
Recall enforcement gate. Element 5 of method.wall-break.persistence (“recall-on-task-start”) operationalized as a runtime check: at task-start of a non-trivial assignment, the agent is required to invoke semantic_recall against TriGraph for prior wall-breaks / similar work / known answers before producing material output.
Phasing pattern verbatim from SPEC-096 v0.2: mode trinary off | advisory | enforce, default advisory; Tier-1 / Tier-2 watched task-classes; force-override audited; lifecycle invariant — refusal keeps session active. Where Recallgate semantics diverge from SPEC-096 (different trigger surface, different evidence model), the divergence is called out below.
Why this exists
Plan 003 Phase 3. The activation-vs-substrate gap that drove SPEC-094 v0.3 + SPEC-096 v0.2 + method.wall-break.persistence reappears at the task-start boundary: even when behavioral memory exists in TriGraph (substrate), agents accept tasks without consulting it. Recallgate makes recall a precondition for accepting non-trivial work.
Watched task-classes
Tier 1 — eligible for enforce mode:
- Spec implementation work (touching files in
backend/app/services/ linked to a SPEC entity)
- BIOS edits (
CLAUDE.md, AGENTS.md, templates/CLAUDE.md, templates/AGENTS.md)
- Governance changes (rules, ADRs, method-fragments, anything under
docs/adrs/, docs/specs/, docs/method-fragments/)
- Cross-lane work (assignment_routing changes, persona-binding changes)
Tier 2 — advisory only even in enforce mode:
- Docs surface tweaks (Mintlify nav, copy edits)
- Dashboard / UI work that doesn’t touch governance state
- Telemetry surface changes
Deferred to v0.2:
- Per-specialization watched lists (e.g. install lane has different Tier-1 than engineering)
- Per-surface lists (Codex vs Claude Code task-classes may differ)
Trigger semantics
The check fires when ALL of:
- (a) the agent is accepting a new assignment (signal
TaskAssigned, operator chat assignment, or self-initiated prism_todo add) AND
- (b) the assignment matches a Tier-1 or Tier-2 task-class AND
- (c) the session has NOT issued a
semantic_recall call within the recent window scoped to the current assignment context.
“Recent window” v0.1: 60 seconds before the task-acceptance moment. Sliding window keyed on session_id; reset when assignment changes.
“Scoped to the current assignment context” v0.1: any semantic_recall invocation since the assignment-acceptance event, regardless of query string. v0.2 may tighten to require a query that overlaps with the assignment’s surface / specialization.
Behavior — by mode
Env flag: PRISM_RECALLGATE_MODE = off | advisory | enforce. Default: advisory.
off
Check does not run. Reserved for emergency disable.
advisory (v0.1 default)
The assignment-acceptance proceeds. The verb returns a warnings: [...] list:
{
"ok": true,
"warnings": [
{
"kind": "missing_recall_on_task_start",
"task_class": "tier1",
"assignment_id": "...",
"remediation": "Issue semantic_recall against TriGraph for prior work on this task-class before producing material output."
}
]
}
enforce (v0.2-eligible)
For Tier-1 task-classes ONLY, the verb refuses with:
{
"ok": false,
"error": "missing_recall_on_task_start",
"stage": "recallgate_preflight",
"task_class": "tier1",
"assignment_id": "...",
"remediation": "Issue semantic_recall before re-attempting task acceptance, or pass force=true with reason."
}
Lifecycle invariant (mirrors SPEC-096 §Behavior): Recallgate refusal does NOT call deregister / heartbeat-stop / stream-shutdown logic. The session stays fully active. The agent invokes semantic_recall, then retries assignment-acceptance.
Tier-2 task-classes stay advisory in enforce mode.
Force-override (audited)
force=true + force_reason (≥10 chars) lets the agent or operator bypass the gate. Emits a recallgate_force audit event in TriGraph attributable to session_id. Audit-event fields:
session_id, agent_identity, force_reason
assignment_id, task_class
would_have_required_recall: bool
This is the Watson surface for Recallgate: every force-override is auditable proof that the gate was bypassed and why.
Watson hook on every recall
Beyond force-override events, every semantic_recall invocation by an agent in a watched task-class emits a recall_invoked audit event:
session_id, agent_identity
assignment_id (current active assignment, if any)
query, source_types, top_k (request shape)
results_returned (count, not bodies)
invoked_at
This builds the dataset Watson SPEC will consume for proof-of-recall surfaces. v0.1 emits; consumption is deferred to Watson.
Backend
New helper in backend/app/services/recallgate.py:
def detect_missing_recall_at_task_start(
*,
session_id: UUID,
assignment_id: UUID,
task_class: TaskClass,
recent_recall_invocations: list[RecallInvocation],
mode: RecallgateMode,
) -> RecallgateOutcome: ...
Pure function over inputs; no I/O. Called from the assignment-acceptance handler (location TBD — likely an extension to prism_signal_ack for TaskAssigned ACKs and to prism_todo add).
Phasing
- v0.1 (this SPEC): Phase 0 advisory mode behind
PRISM_RECALLGATE_MODE=advisory default. Telemetry: recallgate_warning_count per session, recall_invoked audit events flowing.
- v0.2: flip default to
enforce for Tier-1 task-classes after — (a) 7-day advisory observation period, (b) Texi + Candi review of telemetry, (c) test coverage in place. No automatic flip on zero fires (mirrors SPEC-096 nit 3).
- v0.3: per-specialization / per-surface task-class lists.
Tests (mandatory before flag flip)
Backend unit tests in backend/tests/test_recallgate.py:
- Task-class matching (Tier 1 / Tier 2 / unmatched)
- Mode behavior (off / advisory warns / enforce-Tier1 refuses / enforce-Tier2 warns)
- Recent-window logic (recall just before task-accept clears the gate)
- Force-override audit event emission (required force_reason ≥10 chars)
- Lifecycle invariant — enforced refusal does NOT trigger session shutdown calls
recall_invoked audit event shape on every semantic_recall in a watched class
Integration test for assignment-acceptance flow:
- Tier-1 task-acceptance without prior recall → advisory warning surfaces
- Same flow in enforce mode → refusal returned, session still active
- Force-override path → success + audit event written
Out of scope (v0.1)
- Recall quality assessment. v0.1 only checks recall happened, not whether the query was useful. Quality scoring is downstream.
- Cross-session recall coverage. “Donna already ran this recall yesterday in another session” doesn’t satisfy v0.1’s window — v0.1 is per-session.
- Auto-recall. Agents must explicitly invoke
semantic_recall; the gate does not run recall on their behalf.
- Memorule-specific recall enforcement. v0.1 accepts any
semantic_recall; SPEC-094 v0.4 (Memorules) interaction comes in v0.2 once Memorules surface lands.
Cross-references
method.wall-break.persistence v0.2 — Element 5 (recall-on-task-start) is the methodology this SPEC operationalizes
- SPEC-096 v0.2 — phasing pattern, mode-trinary, Tier-1/Tier-2 split, force-override semantics, lifecycle invariant — all mirrored here
- SPEC-094 v0.3 + v0.4 — substrate + Memorules; Recallgate v0.2 will integrate Memorule-aware recall
- (future) Watson SPEC — consumes
recall_invoked and recallgate_force audit events
- Plan 003 Phase 3 — this SPEC’s home phase
Risk
- Adoption: R2 — methodology change affecting how every Tier-1 task-acceptance proceeds. Backward-compat preserved via advisory default + force-override.
- Highest-risk surface: the trigger surface (where assignment-acceptance is detected). Mis-firing on assignments that are NOT real task-starts (e.g. routine
prism_todo adds for housekeeping) creates noise. Tier-1 watched-task list must err narrow at v0.1.
- Phase 0 advisory shipped + observed for at least 7 days of active multi-agent sessions.
- Telemetry:
recallgate_warning_count non-zero AND recall_invoked audit events flowing in observable volume.
- Texi architecture review of v0.1 → v0.2 deltas.
- Candi governance ratification of v0.2.
- Backend test coverage per §Tests.
- Lifecycle invariant test passes (no session-shutdown calls on enforced refusal).
Last modified on June 7, 2026