Why Advanced Hybrid Search
Most AI memory systems use a single retrieval method — usually vector similarity search. That’s fast and cheap, but it misses things. A vector embedding of “database selection rationale” will find semantically similar text, but it won’t reliably find an ADR titled “ADR-007: Postgres over MySQL” because the terms don’t overlap enough for cosine similarity to rank it highly. And neither vector nor lexical search can answer “what was true about the database choice as of March 15?” because neither has a concept of time. And none of them know that a six-month-old memory is more likely to be obsolete than a six-day-old one. Prism’s Hybrid RAG uses four retrieval legs precisely because each one catches what the others miss. This page explains how the system works on ingestion, how it works on retrieval, what the performance cost is, and why that cost pays for itself many times over.The four retrieval legs
| Leg | Technology | What it catches | What it misses |
|---|---|---|---|
| Dense vector | pgvector (MiniLM 384-dim) | Paraphrases, synonyms, conceptual near-misses | Exact proper nouns, spec IDs, error codes |
| BM25 lexical | Postgres tsvector generated columns | Exact terms, technical jargon, identifiers | Semantic similarity when wording differs |
| Tri-graph traversal | Neo4j (SPEC-020 knowledge graph, typed semantic edges) | Identity, typed references, structural relationships | Free-text content not yet projected as entities |
| Temporal recency | exp(-age_days / half_life), half_life=180d | Time-relevance: recent memories outrank stale ones at the same semantic score | Domain-specific staleness (an old fact may still be load-bearing) |
How ingestion works
When an agent files a typed artifact — a decision, a spec, a plan, a retrospective, anything — the backend automatically fans out to three searchable representations and a recorded creation timestamp the temporal leg reads at query time. The agent does nothing special. It calls one MCP verb. The backend does the rest.Dense vector embedding
A MiniLM model (baked into the Docker image — 88 MB, zero first-request latency) generates a 384-dimensional vector from the artifact’s body text. Stored in pgvector alongside the artifact row. At query time, cosine similarity finds semantically related content even when the exact words don’t match. What this catches: an agent searching for “why did we pick that database” finds an ADR whose body says “after evaluating PostgreSQL, MySQL, and MongoDB, we selected Postgres for its pgvector extension” — even though the query shares almost no exact words with the result.BM25 lexical index
Postgrestsvector generated columns auto-index every artifact body on write. Zero additional code, zero extra writes — the database does it as a side-effect of the INSERT. At query time, term-frequency ranking finds content that contains the exact words the agent used.
What this catches: an agent searching for “SPEC-020” or “MissingGreenlet” or “RRF_K=60” gets exact matches on those identifiers. Vector embeddings blur technical terms into semantic neighborhoods; BM25 finds the needle.
Tri-graph projection
The SPEC-025 live-write hook creates an:Entity and :EntityState in Neo4j for every typed artifact at create time. The entity carries canonical identity (UUID, type, immutable props). The state carries domain properties and a valid_from timestamp. Typed references link entities to each other through a granular taxonomy that distinguishes how entities relate, not just that they relate: :IMPLEMENTS, :EXTENDS, :RELATES_TO, :SUPERSEDES, :DEPENDS_ON, :SPECIFIES, :FORMALIZES, :RESOLVES, :DOCUMENTS, plus the generic :REFERENCES fallback. The extractor reads section headings (## Implements, ## Extends, ## Supersedes, ## Related, ## See also, ## Complements, ## Informs) on every spec / ADR / plan / retro / journal write and writes the typed edge directly. Aliases map surface forms to canonical identity ("PrismGR" → Prism).
What this catches: structural relationships, identity across renames, and — critically — temporal state. “What was true about SPEC-020 as of March 15?” is one graph traversal. No other retrieval leg can answer time-scoped questions structurally.
Temporal recency
Every artifact row carries acreated_at timestamp. At query time the temporal leg computes score = exp(-age_days / half_life) per candidate (half_life=180d by default — a 6-month-old memory scores ≥ 0.37, a 1-year-old ≥ 0.13, today’s memories ≈ 1.0). The leg re-weights existing fused candidates from vector + lexical + graph; it doesn’t add new candidates, so the result-set composition is unchanged but ordering favors recent-but-relevant memories.
What this catches: historical-context queries where two semantically similar memories sit at almost the same vector / lexical / graph score but one is current and the other is six months stale. The temporal leg breaks the tie toward the recent one without requiring the agent to inspect timestamps manually.
The ingestion performance tax
Ingestion adds approximately 50ms of overhead per artifact — embedding generation (~30ms on the baked-in MiniLM) plus Neo4j projection (~20ms). The BM25 index is free (Postgrestsvector generated column — computed on INSERT with zero additional I/O). The temporal leg is free at ingest (it reads created_at at query time, no separate index).
For a system where artifacts are filed a few times per session, not thousands of times per second, this is invisible. A session that files 10 artifacts pays 500ms total — less than one human blink.
The payoff: every future query against that artifact hits four independent retrieval surfaces instead of one. The 50ms tax per write produces years of faster, more accurate reads.
How retrieval works
When an agent callssemantic_recall, the vector / lexical / graph legs fire simultaneously, the temporal leg re-weights the fused candidate set, and the tri-graph annotates the output with temporal-state context before returning.
Reciprocal Rank Fusion (RRF K=60)
Each retrieval leg returns a ranked list of candidates. RRF assigns each candidate a score based on its rank position in each list:exp(-age_days / 180d) so recent-but-relevant memories rank ahead of stale ones at the same fused score.
Why this matters: a result doesn’t need to be the best match in any single leg — it needs to be a good match across multiple legs. That cross-validation is what makes hybrid retrieval more accurate than any single method. A false positive in one leg is unlikely to also be a false positive in three others.
Temporal annotation
After fusion, the tri-graph layer checks each result against its entity-state history:- If the result references a current state → tagged
[current] - If the result references a superseded state → tagged
[historical] - If the result has no tri-graph entity (pre-SPEC-025 legacy) → no tag, treated as unversioned
Retrieval performance
| Environment | p50 latency | Notes |
|---|---|---|
| Local Docker stack | ~80ms | All four legs on localhost |
| LAN server | 185–310ms | Includes network round-trip |
exp(-age_days / half_life) per surviving candidate from the row’s created_at timestamp, no separate fetch.
The tokenomics payoff
This is where the performance tax on ingestion pays back exponentially.The comparison
| Approach | Tokens consumed | Accuracy | Temporal awareness |
|---|---|---|---|
| Full-history context load | 20,000+ tokens | Low — agent scans irrelevant material | None — agent reasons about freshness from text |
| Vector-only retrieval | ~400 tokens | Medium — semantic matches, no exact-term or temporal signal | None |
| Prism Hybrid RAG | ~400 tokens | High — three-leg cross-validated, temporally annotated | Structural — [current] vs [historical] tags |
Compound savings
Multiply the per-query savings across a real workload:- A typical agent session makes 5–15 context-retrieval queries
- A multi-agent team runs 4–8 sessions per day
- A project lifetime spans weeks to months
The real cost of inaccurate retrieval
The tokenomics comparison above shows the per-query savings. But the deeper cost of bad retrieval isn’t the tokens wasted on the query itself — it’s the cascade of damage that follows when an agent acts on the wrong answer. Incorrect code generation. An agent that retrieves a superseded architecture decision as its top result generates code against assumptions that are no longer true. The code compiles. It may even pass unit tests. But it’s structurally wrong — built on a foundation the team already moved away from. By the time a human catches it, the agent has generated dozens of files, and the correction isn’t a patch. It’s a delete-and-rebuild. Every line of incorrect code costs the original generation tokens plus the review tokens plus the deletion plus the regeneration against the correct context. Incorrect spec and artifact generation. Worse than wrong code is a wrong spec — because a wrong spec compounds into more wrong code by other agents in future sessions. An agent that retrieves stale project state and drafts a spec from it produces an artifact that looks authoritative, gets filed into the project record, and misleads every subsequent session that retrieves it. The error propagates forward through the temporal record until someone notices the contradiction. The remediation isn’t just deleting the bad spec — it’s tracing every downstream artifact and session that may have consumed it. Accidental deletion of prior work. An agent that can’t accurately recall what already exists will sometimes rebuild something that was already built, overwriting or conflicting with prior work in the process. This is especially common in multi-agent setups where two agents independently retrieve different (or incomplete) views of project state. One agent’s “new implementation” is another agent’s “destroyed three days of work.” With accurate hybrid retrieval, the agent knows what exists before it acts. Introduction of bugs and unexpected behavior. When an agent’s context is stale or wrong, the bugs it introduces are the hardest to diagnose — because the agent’s reasoning was internally consistent, just grounded in outdated facts. The resulting behavior looks like a regression but isn’t traceable to a code change; it’s traceable to a context change that happened silently between sessions. These bugs eat hours of debugging time because the human is looking in the code for a cause that lives in the memory layer. The multiplier: double the tokens, double the time. Every failure mode above has the same cost shape — the original work (wasted), the diagnosis (additional), the correction (additional), and the rebuild (additional). A conservative estimate is 2–3× the token cost and wall-clock time of doing it right the first time. In a multi-agent team running 4–8 sessions per day, even a 10% bad-retrieval rate produces a compounding drag that erases most of the productivity gain the agents were supposed to deliver. This is the argument for paying the 50ms ingestion tax and the 80ms retrieval latency. The performance cost of hybrid search is measured in milliseconds. The cost of not having it is measured in hours, deleted work, propagated errors, and tokens burned on rework that should never have been necessary.Less toil from context management
In systems without structured retrieval, the human operator becomes the context manager — re-explaining decisions, pasting prior artifacts into the chat, correcting stale references, verifying that the agent’s understanding of current state is actually current. That’s the invisible toil tax that doesn’t show up in any dashboard but eats hours per week. With Prism’s Hybrid RAG, the agent retrieves its own context, verifies its own temporal freshness via the tri-graph’s[current] / [historical] tags, and asks the human only when the retrieval surface genuinely has no answer. The human’s job shifts from re-supplying context to directing work — which is the job they were hired for.
What most people don’t know is happening
The agent experience is deceptively simple: callsemantic_recall, get a grounded answer. What’s invisible is the machinery behind that simplicity:
- Every artifact write fans out to three indexes automatically and the temporal leg reads its
created_atat query time. The agent never thinks about indexing. It just files artifacts through normal MCP verbs and the backend handles the rest. - Every query runs four parallel scoring legs and fuses the results. The agent doesn’t choose which retrieval method to use. It asks one question and gets the cross-validated answer.
- Temporal freshness is structural, not heuristic. The agent doesn’t need to reason about whether a result is current. The tri-graph tells it, as metadata, before reasoning begins.
- The performance cost is front-loaded on writes, not reads. 50ms per write, 80ms per read. The write tax is paid once per artifact. The read savings compound across every future query by every agent on the project.
- Accuracy compounds with corpus size. Unlike context-window approaches that degrade as history grows (more material to scan, more irrelevant tokens displacing reasoning), hybrid retrieval gets better with more data — more candidates for RRF to cross-validate, more entity-state history for temporal annotation, more aliases and references for the tri-graph to traverse.
Beyond the four legs: structured memory contracts
The four-leg fusion is the retrieval engine. The work shipped through Plan #10 added a contract layer above it — the part that decides what an agent is allowed to read or write, what counts as evidence, and how procedural knowledge becomes a first-class artifact instead of free-text scratch. Three SPECs land together to make this real, all default-off behind feature flags so they can roll out without disturbing the existing recall surface.Memory domain contracts (SPEC-079)
Per-agent memory historically mixed long-term facts (architecture decisions, postmortems, ratified specs) with transient session scratch (debugging notes, work-in-progress sketches, intermediate handoffs). The agent treated both as equally authoritative — the structural failure mode that drove the “context bleed” complaint on every shipped memory system. SPEC-079 v0.2 codifies twelve memory domains with explicit read/write contracts. Each session delta and stored memory carries adomain tag that names which contract governs it (governance, methodology, runtime, retrospective, signal-trace, install, and so on). The CI loop shipped in PR #148 tags every write at the source and surfaces missing-domain advisories so the boundary is observable rather than implicit. The intent is that semantic recall reads from a typed surface, not a flat soup — “give me governance evidence for this decision” and “give me debugging context for this regression” become structurally distinct queries against the same Hybrid RAG fusion.
The domains are also the substrate the Capability Library and Evidence Graph (described in Vision) read from. A capability’s evidence chain is a domain-typed traversal across the same artifact corpus the four-leg fusion already indexes — no separate store, just typed edges and read contracts on top.
Method fragments — procedural knowledge as typed artifacts (SPEC-078)
DFW carried procedural knowledge inCLAUDE.md and prose retros. Prism inherited the same shape until SPEC-078 landed. Method fragments are typed artifacts — small, focused, evidence-bound — that capture how to do a recurring thing the right way: a review pattern, a deploy ritual, a regression-test discipline. They live in the same store the spec / ADR / plan surface uses, with the same lifecycle (proposed → experimental → active → deprecated → superseded → retired) and the same evidence-binding requirements.
Two seed fragments ship in v0.2:
method.completion.done-definition— the rule that “completion” means merged + deployed + tested, not merged alone. Proven in production by the Donna deploy lane discipline that closes every PR through the full ship sequence.method.parallel.ownership-contract— the consensus-first parallelism contract that lets multiple agents work without colliding (explicit write ownership, single-driver-per-domain, signal-mediated handoffs).
prism_method_fragment_recall — a typed surface on top of the Hybrid RAG fusion that returns the fragment body, its evidence count, its supersession history, and its applicability to the current task context. The verb is one of three new verbs (alongside prism_signal_ack and prism_signal_trace) that landed with the SPEC-078 wave.
Governance lookup — graph-backed rule and capability recall (SPEC-080)
The fourth retrieval leg (tri-graph traversal) was always capable of structural lookup, but the queries an agent actually wanted to ask — “what governance applies to this risk tier on this surface?”, “which capabilities are in scope for this project?”, “is this rule still in force or has it been superseded?” — needed a typed verb on top.prism_governance_lookup is that verb. Given a pid and a query class (applicable_rules, surface_capabilities, authority_lookup, method_fragments, memory_domains, review_gates, validation_context, supersession_check, conflict_lookup), it returns the governance state with source, citation, freshness, and supersession reporting baked into the result. The result is advisory only — it never overrides Ring authority or live prism_start state — but it gives agents a structured window into the governance graph that didn’t exist before.
The shipped Phase 2 governance_backfill CLI seeded 196 :Artifact / 48 :Decision / 12 :SourceDocument+:InstructionSurface entities for PID-PGR01 behind the default-off PRISM_GOVERNANCE_LOOKUP_ENABLED flag. Phase 3 v1.A and v1.B added strong-evidence enforcement at checkpoint and wrap so governance recall doesn’t silently degrade as the corpus grows.
Active recall, not passive
Together, these three SPECs move memory from passive retrieval (“agent asks, agent receives, agent decides”) toward active recall — the system noticing when a recall result is stale, when a method fragment applies to current work, when a governance rule has been superseded. The agent still asks. The substrate now has the typed contracts and the evidence graph to answer with confidence and a citation. The four-leg fusion is the engine. SPEC-078 + SPEC-079 + SPEC-080 are the contracts, the typed artifacts, and the citation discipline that turn the engine into a system an operator can trust.Where to go next
Overview
Back to what Prism is and what it solves
Tri-Graph Architecture
The three-layer knowledge graph in depth
Vision
Where the market is going and how retrieval compounds
Installation
Two commands from a fresh clone to a working install

