Skip to main content
Status: draft · Version v0.1 · Filed 2026-04-30

spec_id: SPEC-059 version: v0.1 status: draft authored_by: Donna date: 2026-04-30

SPEC-059 — Tri-Graph Memory Activation + Memory-Hit Telemetry

Status

Draft — for Frank’s review. This is an execution spec, not a new design. SPEC-020 (Lola, 2026-04-19) already designed the Canonical / Semantic / Temporal tri-graph. ADR-018 already chose RRF as the fusion strategy. Neither shipped. SPEC-059’s job: ship them, and add the telemetry layer that makes the tri-graph’s value visible on the prism_vs_native dashboard.

Origin

SPEC-058 review on 2026-04-30 (during the duplicate-signals diagnostic) surfaced a structural pattern:
  • Six SPEC-054 port misses in 24h. Cross-cutting Python invariants didn’t survive the Node port because nothing in the planning workflow forced them to surface. The dedup contract from SPEC-037 §3 was load-bearing for SPEC-054 but had no edge “SPEC-054 depends on SPEC-037” anywhere queryable.
  • TODO #1 is still open — “Phase 4 adds Neo4j as the graph leg” — from ADR-018 RRF design. The graph leg has been on the queue since 2026-04-17.
  • SPEC-020 v1.0 has been in DRAFT since 2026-04-19 — the full tri-graph design that nobody implemented.
  • My own workflow miss (Donna, this session): went git-first when reviewing the port misses. Frank stopped me. Once I queried memory, the answer was in the first result. Saved as feedback_memory_first_for_historical.md.
  • Codex connection: Texi’s observed_pending_duplicates: 2 is the same family. Any agent — Codex, Cursor, future surfaces — runs into the same blind spot when the graph leg returns 0 results because edges don’t exist. Activation benefits all surfaces equally.
  • No memory-hit telemetry feeding the dashboard. Frank’s project_prism_vs_native_telemetry standing rule asked for observable proxies; we have qualitative tracking (“memory-saved-me events”) but nothing the dashboard can plot.

Three findings being addressed

  1. Graph leg empty. RRF (ADR-018) gives the graph leg equal weight; an empty leg silently degrades retrieval to two-graph.
  2. Temporal leg never built. SPEC-020 §5.3 designed EntityState versioning, HAS_STATE / SUPERSEDED_BY edges, and as_of queries. Recall today has no time awareness — a 6-month-old memory and a 1-day-old memory rank identically on relevance.
  3. No memory-hit telemetry. No way to measure: hit rate, stale rate, tokens-loaded-vs-tokens-saved, per-agent recall behavior. The prism_vs_native question (“does memory pay for itself?”) cannot be answered with data today.

Goals

  1. Graph leg pays for itself. Spec ↔ spec dependency edges exist in Neo4j and contribute non-zero scores in semantic_recall.
  2. Temporal leg active. as_of queries return point-in-time state; recency-decay is a default-on weight in fused ranking; stale memories surface for re-verification.
  3. Memory tokenomics observable. Every semantic_recall is logged; the dashboard plots hit rate, stale rate, tokens-loaded-vs-saved, per-agent breakdown.
  4. Port-miss prevention. The graph leg surfaces upstream invariants (e.g., SPEC-037 §3) on the first recall when planning a port, so cross-language port misses cannot happen the way they did with SPEC-054.

Non-goals

  • Replacing the current vector + lexical legs. Both stay. SPEC-059 adds graph + temporal, doesn’t subtract.
  • Backfilling all 192+ historical memories into the new Temporal model. SPEC-020 §3 already declared this out-of-scope; SPEC-059 inherits.
  • Solving entity disambiguation across ptypes (SPEC-020 §3 explicit out-of-scope).
  • Cross-tenant fact conflicts (SPEC-056 covers identity isolation; this is its own future spec).
  • Auto-resolving stale memories. SPEC-059 surfaces them; agents/operators decide whether to rewrite.

Architecture

§3.1 Graph leg activation

Implements SPEC-020 §5.1 + §5.2 with one significant addition: auto-extraction of spec-dependency edges from existing markdown sections. Edge auto-extraction. On every prism_spec create | update, parse the body for these section headings (case-insensitive, level-2 or level-3):
Section headingEdge written
## References / ## Related / ## Reference:REFERENCES
## Implements:IMPLEMENTS
## Depends on / ## Depends-on:DEPENDS_ON
## Extends:EXTENDS
## Supersedes:SUPERSEDES
## Complements / ## Informs:RELATES_TO
## Relationships (block)parse each subsection’s verb
Within each section, find tokens matching SPEC-\d+ / ADR-\d+ / ADR #\d+. For each match, look up the target Entity in Neo4j (created lazily if absent) and write the edge from the current spec’s Entity. Edge properties: source_memory_id = the spec’s row id, extracted_from_section, created_at. Backfill. Run the same extractor over every existing spec body in signal_queue-style migration: idempotent, re-runnable, writes new edges and prunes stale ones (where the markdown reference was removed in a later edit). Backfill is gated on a prism_audit --fixup memory-graph invocation so Frank can review the edge set before commit. Recall integration. semantic_recall already returns a graph_score per result (visible in today’s output). After SPEC-059, the score reflects the count of edges connecting the result entity to entities mentioned in the query (1-hop and 2-hop, with diminishing returns). RRF (ADR-018) rolls graph_score into the final fused score equal-weight with vec_score and lex_score.

§3.2 Temporal leg activation

Implements SPEC-020 §5.3 with concrete schema, ingestion path, and recall integration. EntityState nodes. (:EntityState {uuid, entity_uuid, valid_from, valid_until, props_json, source_memory_id}). Every Entity (Spec, ADR, TODO, Persona, Project, etc.) has at least one EntityState — the current state — with valid_until = NULL. State transitions create a new EntityState and update the prior one’s valid_until. Supersession edges. (:EntityState)-[:SUPERSEDED_BY {event_type, at, cause_memory_id}]->(:EntityState). event_type is descriptive: "renamed", "version_bumped", "superseded", "retired", "merged". The cause_memory_id points to the memory/ADR/delta that triggered the change — this is what makes event-causality queryable. Ingestion. Existing verbs that mutate state (prism_spec update, prism_decide for status changes, prism_persona_create, prism_archive) become temporal-leg-aware: instead of UPDATE-in-place on the Postgres row, they write a new EntityState in Neo4j and update the prior one’s valid_until. Postgres remains the durability backstop with full state at any point in time; Neo4j carries the version chain. Recall integration. Three new behaviors:
  1. as_of parameter on semantic_recallsemantic_recall(query, as_of="2026-04-15T00:00:00Z") filters Temporal-layer reads to states valid at that timestamp. Default as_of=null returns current state (all states with valid_until IS NULL).
  2. Recency decay as a fourth RRF legtemporal_score = exp(-age_days / half_life), with half_life=180 as default (memories from the last 6 months score ≥ 0.37; a year old ≥ 0.13). Fused with vec/lex/graph at equal weight by default; tunable per query class.
  3. Stale-memory surfacing — recall results expose a staleness field: last_verified_at, age_days, superseded (boolean). The agent can choose to re-verify before citing. New verb prism_remember --reverify <memory_id> updates last_verified_at without changing content.

§3.3 Memory-hit telemetry

New observability layer; not in SPEC-020. Hooks into Frank’s project_prism_vs_native_telemetry standing rule. New table: memory_recall_events
CREATE TABLE memory_recall_events (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    project_id      UUID NOT NULL REFERENCES projects(id),
    agent_id        UUID REFERENCES agents(agent_id),
    agent_session_id UUID,
    query_text      TEXT NOT NULL,
    query_class     TEXT,         -- 'historical' | 'architectural' | 'current_state' | 'decision' | 'other'
    result_count    INT NOT NULL,
    result_ids      UUID[] NOT NULL,
    top_n_selected  INT,          -- how many results the agent actually used
    payload_bytes   INT NOT NULL, -- input tokens cost proxy
    rrf_weights     JSONB,        -- {vec: 0.25, lex: 0.25, graph: 0.25, temporal: 0.25}
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE memory_citation_events (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    recall_event_id UUID NOT NULL REFERENCES memory_recall_events(id),
    memory_id       UUID NOT NULL,
    citation_kind   TEXT NOT NULL,  -- 'cited' | 'dismissed' | 'flagged_stale' | 'rewrote' | 'saved_rework'
    notes           TEXT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
Capture points.
  • semantic_recall writes a memory_recall_events row on every call.
  • prism_remember update | rewrite writes a memory_citation_events row with kind='rewrote' linking back to the recall that surfaced the now-rewritten memory (when the recall id is in scope).
  • New verb prism_memory_cite lets the agent explicitly tag a recall hit: prism_memory_cite(recall_event_id, memory_id, kind, notes). Agents call this at most once per relevant memory in a session — light enough to not be a tax.
  • prism_checkpoint and prism_wrap auto-aggregate citation events into the wrap delta (the existing tool_use_summary standing rule).
Query-class auto-tagger. A small classifier on the query_text (regex + keyword list, no LLM) tags each recall as historical / architectural / current_state / decision. Lets the dashboard slice hit rate by query class. The four-class taxonomy comes from the workflow heuristic in feedback_memory_first_for_historical.md.

§3.4 Dashboard panels

Adds to the existing prism_server_dashboard (already in dashboard/):
PanelSourceQuestion it answers
Recall hit rate by query classmemory_recall_eventsmemory_citation_events (kind=‘cited’)Where is memory pulling weight?
Stale rate trendmemory_citation_events (kind=‘flagged_stale’ / ‘rewrote’)Is memory getting more or less reliable over time?
Tokens loaded vs tokens savedmemory_recall_events.payload_bytes summed; kind='saved_rework' count × estimated rework costThe Frank question: does memory pay for itself?
Per-agent recall behaviorgroup by agent_idDonna vs Lola vs Codex vs Candi — which surfaces use memory well?
Top-cited memoriesmemory_citation_events (kind=‘cited’) group by memory_idWhich memories are load-bearing?
Never-cited memoriesLEFT JOIN resultWhich memories are dead weight, candidates for archive?
Graph-leg liftmemory_recall_events.rrf_weights ⨯ result rank deltaHow much value does the graph leg add vs vec+lex alone?
Temporal-leg liftSame with temporal weightHow much value does recency-decay add?

Files changed

FileChangeWhy
backend/alembic/versions/026_spec059_memory_telemetry.pyNewmemory_recall_events + memory_citation_events tables; Neo4j has its own schema bootstrap
backend/app/services/memory/graph_extractor.pyNewMarkdown reference extractor; turns ## References blocks into Neo4j edges
backend/app/services/memory/temporal.pyNewEntityState write path; SUPERSEDED_BY edge writer; valid_from/valid_until invariants
backend/app/services/memory/recall.pyModifiedAdd as_of parameter; add temporal_score leg; record memory_recall_events row; emit query_class
backend/app/services/memory/rrf.pyModifiedAdd temporal_score weight; tune-per-query-class hook
backend/app/routers/spec.pyModifiedAfter `prism_spec createupdate`, fire graph_extractor.extract_and_write_edges
backend/app/routers/memory.pyModifiedAdd prism_memory_cite endpoint
mcp-node/src/verbs/memory.tsModifiedNew prism_memory_cite verb
dashboard/web/src/pages/MemoryPage.tsxNewEight panels listed in §3.4
dashboard/src/metrics/memory.tsNewScrapers / aggregators for the panels
tests/test_spec059_graph_extraction.pyNewBackfill correctness on the existing 47 specs
tests/test_spec059_temporal_as_of.pyNewPoint-in-time query returns historical state
tests/test_spec059_recall_telemetry.pyNewEvery recall persists a row; cite events link correctly
tests/test_spec059_port_miss_guard.pyNewReproduces the SPEC-054 scenario: query “what does SPEC-054 depend on” → returns SPEC-037 in top-3 by graph leg alone
No mcp-node WebSocket-plane changes. No SPEC-058 collision.

Test plan — port-miss prevention as the headline acceptance test

The test that would have prevented the 24h of port misses:
TEST: graph_leg_surfaces_upstream_invariants_on_port_planning

GIVEN  All 47 existing specs have been backfilled into the graph leg
       per SPEC-059 §3.1 extractor.

WHEN   An agent (Donna / Lola / Codex / Cursor) runs:
         semantic_recall("SPEC-054 dependencies port plan TypeScript")

THEN   The top-5 results MUST include SPEC-037 (the dedup contract spec)
       with a graph_score > 0 due to the SPEC-037 → SPEC-054 inverse
       edge written by the extractor (SPEC-054 ## References mentions
       SPEC-037, so the bidirectional traversal returns it).

AND    SPEC-034 must appear (signal delivery foundation).
AND    SPEC-044 must appear (channel push, depends on SPEC-034).
AND    SPEC-052 must appear (signal cache, depends on SPEC-034).

  This is the test that, had it existed before PR #21, would have
  forced the port author to acknowledge SPEC-037 §3 in the port plan.
  Six port misses in 24h are the receipt of its absence.
Plus the standard SPEC-020 invariant tests:
  • Canonical layer: every Entity has exactly one INSTANCE_OF
  • Temporal layer: at most one EntityState with valid_until IS NULL per Entity at any time
  • Supersession chain acyclic
  • valid_from of superseder == valid_until of superseded
Plus telemetry tests:
  • Every semantic_recall writes exactly one memory_recall_events row
  • prism_memory_cite writes exactly one memory_citation_events row
  • Dashboard SQL aggregates roll up correctly

Acceptance criteria

  1. All 47 existing specs have at least one outbound graph edge after backfill.
  2. semantic_recall("SPEC-054 dependencies") returns SPEC-037 in top-5 — driven by graph_score, not just text similarity.
  3. semantic_recall(query, as_of="<past timestamp>") returns state valid at that timestamp; the same query without as_of returns current state.
  4. Recency decay default-on: a 30-day-old session_delta scores higher than a 6-month-old session_delta when both have the same vec+lex match strength.
  5. memory_recall_events populated on every recall; memory_citation_events populated on every prism_memory_cite.
  6. Dashboard /memory page renders all eight panels with non-zero data within 24h of merge.
  7. The port-miss-prevention test (§test plan headline) passes against a fresh backend with the backfilled spec corpus.
  8. Codex (Texi) running the same semantic_recall("SPEC-054 dependencies") query gets the same top-5 answer — the graph leg is surface-agnostic.

Phased rollout

Phase 1 — Graph leg. Backfill spec edges, wire auto-extractor on prism_spec create | update. RRF starts using graph_score on the existing data. Lowest-risk, highest-immediate-value: solves the SPEC-054 port-miss class on the next port. Phase 2 — Temporal leg. EntityState versioning, SUPERSEDED_BY edges, as_of parameter, recency decay. Higher-risk because state-mutating verbs change shape. Phase 3 — Telemetry + dashboard. memory_recall_events table, prism_memory_cite verb, eight dashboard panels. Lowest-risk addition; doesn’t change behavior, only observes. Each phase ships independently. Phase 1 gating PR can land within days; Phase 2 is the longer pole; Phase 3 stacks on top of Phase 2 once telemetry has data to show.

Lessons learned (embedded — Frank asked for them in this spec)

The SPEC-058 diagnostic plus the SPEC-054 port-miss family teach four durable lessons. Saved as memory references where applicable:
  1. Memory-first for historical context. When the question is “how did this work before?” / “what’s the upstream invariant?” — query Prism memory first, fall back to git/SQL only when memory has been confirmed empty. SQL/SSH is right for live operational state, never for historical/architectural state. Saved as feedback_memory_first_for_historical.md (this session, 2026-04-30). Frank caught this on me directly.
  2. Cross-cutting decorators don’t survive language ports unless the dependency graph forces them up. SPEC-037 §3’s dedup at the merge boundary was load-bearing for SPEC-054 but had no graph edge connecting the two specs. The port translated file-by-file and lost the invariant. Six SPEC-054 port misses in 24h are the receipt. Memory family: project_spec_054_port_miss_* already documents the symptoms; SPEC-059’s graph leg is the structural prevention.
  3. A tri-graph without graph edges is a two-graph. ADR-018 RRF gives equal weight to vector + lexical + graph; an empty graph leg silently degrades retrieval to two-graph and nobody noticed for ~2 weeks. Surface-level metrics (“recall returns results”) didn’t expose the degradation. Telemetry honesty (§3.3) is the prevention — without observability into per-leg lift, future degradations stay hidden the same way.
  4. The Codex finding generalizes. Texi’s observed_pending_duplicates: 2 is the same family — a contract lost in port. The fix isn’t Codex-specific because the root cause isn’t Codex-specific; it’s “graph leg returns 0 because edges don’t exist, so no surface can find the upstream invariant.” All AI surfaces — Codex, Claude Desktop, Claude Code, Cursor, future ones — benefit equally from edge population. This is also why this spec is graph-first, surface-agnostic.

Out of scope

  • Re-implementing memory ingestion from scratch. SPEC-020 §7 already designed it; SPEC-059 just adds extractor + telemetry hooks at the existing pipeline boundaries.
  • Multi-tenant memory federation. SPEC-056 governs identity isolation; cross-tenant memory is a future spec.
  • Auto-rewriting stale memories. SPEC-059 surfaces them; rewrite remains a deliberate agent/operator action.
  • Token-cost normalization across surfaces (Codex tokens vs Claude tokens). The dashboard shows raw bytes; surface-relative pricing is dashboard-side calculation, deferred.

References

  • Specs: SPEC-020 (Tri-Graph Knowledge Representation — Canonical/Semantic/Temporal, Lola 2026-04-19) · SPEC-058 (Signal Delivery Single-Source-of-Truth, this session) · SPEC-019 (env resolution) · SPEC-052 (per-identity signal cache) · SPEC-054 (Node MCP shim port — the regression class this spec prevents repeating).
  • ADRs: ADR-018 (Reciprocal Rank Fusion), ADR-019 (memory candidates G5–G8), ADR-020/21/22 (proposed alongside SPEC-020).
  • TODOs: #1 (Phase 4 adds Neo4j as the graph leg) — this spec closes it. #100 (SPEC-057 backend, blocked on SPEC-056 merge) — independent. #101 (graph-leg backfill — filed this session, this spec subsumes it).
  • Memories:
    • feedback_memory_first_for_historical — the workflow lesson from this session
    • project_prism_vs_native_telemetry — the standing rule that motivates §3.3 telemetry
    • project_spec_054_port_miss_project_id + project_spec_054_port_miss_coalescing_reset — the regression family this spec prevents
    • feedback_document_port_misses — every cross-language port regression gets a memory; this spec means the graph leg surfaces the invariants before the port
    • feedback_no_hardcoded_machine_values + feedback_commercial_grade_naming — memory hygiene practices that the dashboard’s stale-rate panel can plot for compliance
  • Live evidence (this session):
    • SPEC-058’s diagnostic: 22 piggyback / 1 channels_push in last 24h proved a structural gap was hiding; memory wasn’t surfacing the upstream invariant
    • The semantic_recall query in §test plan headline reproduces the gap on demand: today the graph_score for SPEC-054→SPEC-037 is 0; SPEC-059 makes it non-zero

Authorship

Donna (Claude Code, mini3, session 3f36b796). 2026-04-30. Authored after Frank’s directive to:
  1. File the workflow miss (memory-first) — done as feedback_memory_first_for_historical.md
  2. File the graph-edge backfill TODO — done as TODO #101
  3. Write the spec covering graph leg + temporal leg + memory tokenomics — this spec
  4. Embed lessons learned, including the Codex generalization — see Lessons learned section above
Frank’s framing: “we are reinventing the wheel — wouldn’t that be a total miss on using our trigraph.” The miss isn’t the tri-graph design (SPEC-020 is sound). The miss is execution + observability. SPEC-059 closes both gaps.
Last modified on May 18, 2026