Status: accepted · Version 1 · Filed 2026-04-30
spec_id: SPEC-056
version: 1
status: accepted
authored_by: Donna
date: 2026-04-30
SPEC-056 — Multi-tenant Routing, Identity, and Signal Isolation
Status
Accepted — 2026-04-30. Authored Donna; reviewed Frank.
Problem
The signal-routing layer was designed for single-tenant use and conflates several distinct concepts that must be separable for multi-tenant operation:
-
Routing target = process binding. Today’s
prism:events:session:{session_id} channel is keyed on a process-lifetime ID. When an agent wraps and restarts, the session_id changes; pending signals for the old session deliver via piggyback (HTTP poll) but the per-identity cache writer (signalCache.record()) on the WebSocket subscriber path never fires. Statusline bell+count is always 0.
-
Persona name treated as identity.
controller_registrations.agent_identity is a string; uniqueness is (tenant_id, project_id, agent_identity). Persona renames break identity continuity. Cross-tenant collaborators with name collisions cannot coexist.
-
No Org level between Tenant and Project. Tenant and Project are the only routing-scope levels. Departments / business units inside a tenant cannot be addressed as a routing scope.
-
No User/Agent-Owner concept. Two humans collaborating on the same Project cannot each have a persona with the same display name.
-
No User Work Session concept. No daily-work boundary; analytics, retros, and cross-machine continuity have no natural grouping.
-
Latent bug discovered 2026-04-30. SPEC-054 Phase 3 port flattened
response.registration.project_id to response.project_id, dropped Python’s defensive guard, sent empty X-Prism-Project-Id header → backend WS rejects (close 4002 → uvicorn 403) → Redis pubsub has 0 subscribers → publish_session_event publishes into the void → signalCache.record() never fires → bell+count permanently dark. Documented in project_spec_054_port_miss_project_id memory.
Goals
- Hard tenant isolation: zero cross-tenant signal bleed under any scenario.
- Hard project isolation: zero cross-project signal bleed within a tenant.
- Hard session isolation: DMs to one agent never reach another agent.
- Routing continuity across restarts: same agent_id, same channel, signals delivered on next subscribe.
- Same code path for all deployment modes (Personal, Multi-tenant LAN, LAN-public-opt-in, Cloud) — no local-mode shortcuts.
- Schema and routing layer must reserve seams for future capabilities (
project_memberships, Enterprise Agent Services catalogue, GitHub-invite cross-network collab, deployment-mode TLS posture) without re-plumbing later.
Non-Goals
- Federation across separate Prism installations. Explicitly out of scope.
- Implementing the future capabilities themselves (cross-tenant memberships, agent catalogue, GitHub-invite) — only reserving the seams.
- Performance optimization beyond what falls naturally out of correctness.
Architecture
Hierarchy (4 levels, all distinct)
Enterprise / Tenant (top — billing/contract entity)
└── Org (department or logical sub-grouping)
└── Project (working scope)
└── Agent (active worker / persona)
Each parent has many children. Each level is addressable as a routing scope.
Identity model
Agent is the stable routing referent. agent_id (UUID) is immutable and survives:
- Persona display-name changes
- prism_start / prism_wrap cycles (agent_session churn)
- Machine moves
- Cross-tenant project-membership grants (future)
Display name is presentation only. Server resolves display name → agent_id at signal-send time within the sender’s project scope.
User (= Agent Owner) is the human who created the persona. Today’s Personal Install has one User (Frank). Multi-user-per-tenant scenarios are reserved for future but the schema carries user_id from day one.
Two-session model
Work Session (User-level):
- One row per User per day’s-work unit.
- Auto-created on first agent_session of the day for a given user.
- Ends explicitly (
prism_work_session_end) or rolls over at midnight.
- Can span multiple machines.
- Used for: daily roll-up analytics, retros, signal history grouping.
- Not a routing filter.
Agent Session (Process-level):
- One row per
prism_start → prism_wrap cycle for one agent on one machine.
- Bound to a specific machine + process.
- Heartbeat-tracked for liveness.
- Not a routing filter — channel naming uses
agent_id, not agent_session_id.
agent_sessions.work_session_id FK ties each process-level session to the user’s day-of-work.
Channel naming
prism:events:tenant:{tenant_id} # tenant-wide broadcasts
prism:events:org:{tenant_id}:{org_id} # org-wide broadcasts
prism:events:project:{tenant_id}:{org_id}:{project_id} # project-wide broadcasts
prism:events:agent:{agent_id} # DMs to a specific agent
Every parent identifier is qualified into the child’s channel name.
The agent channel is the critical change from today: keying on agent_id (stable) instead of session_id (transient) means signals queued during an agent’s wrap window deliver on the next subscribe.
Routing rules
Subscription authorization — a session may subscribe to a channel iff:
| Channel | Authorization rule |
|---|
tenant:{T} | bearer’s tenant == T |
org:{T}:{O} | bearer’s tenant == T AND user is member of org O |
project:{T}:{O}:{P} | bearer’s tenant == T AND user has access to project P (today: same tenant; future: cross-tenant via memberships) |
agent:{A} | session’s agent_id == A (always self-only) |
Publish-side rules (sender’s tenant is sticky):
| Reach | Publishes to | Trust property |
|---|
scope=tenant | tenant:{sender_tenant} | Bounded by sender’s home tenant |
scope=org | org:{sender_tenant}:{sender_org} | Bounded by sender’s home org |
scope=project | project:{tenant}:{org}:{project} | Reaches all project members regardless of home tenant (future memberships) |
scope=direct (to_agent=X) | agent:{resolved_agent_id} | Resolved within sender’s project scope |
Address resolution for to_agent="display_name":
- Look up
agents WHERE project_id = sender.project_id AND display_name = name.
- If exactly one match → that’s the target.
- If multiple matches (different user_id) → prefer sender’s own user_id; else error.
- Cross-user disambig syntax:
to_agent="display_name@user_login" (TBD).
Project-scoped routing rule (the hard rule): an agent can only address other agents within the same project. Cross-project DMs are not supported.
WebSocket handshake header set
Authorization: Bearer {api_key}
X-Prism-Tenant-Id: {tenant_id}
X-Prism-Org-Id: {org_id}
X-Prism-Project-Id: {project_id}
X-Prism-User-Id: {user_id}
X-Prism-Agent-Id: {agent_id}
X-Prism-Agent-Session-Id: {agent_session_id}
X-Prism-Work-Session-Id: {work_session_id}
X-Prism-Last-Event-Id: {event_id}
Defensive guard (lesson from SPEC-054 port miss): the bootstrap layer must skip stream spawn entirely if any required ID is missing. Empty headers cause backend close 4002 + reconnect storm; missing IDs are a registration bug to surface, not a transient to retry.
Schema
New tables
users — humans (or external collaborators) who own agents.
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
email TEXT NOT NULL,
github_login TEXT,
display_name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(tenant_id, email)
);
orgs — departmental sub-grouping inside a tenant.
CREATE TABLE orgs (
org_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
slug TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(tenant_id, slug)
);
agents — stable persona identity.
CREATE TABLE agents (
agent_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id),
tenant_id UUID NOT NULL REFERENCES tenants(id),
org_id UUID NOT NULL REFERENCES orgs(org_id),
project_id UUID NOT NULL REFERENCES projects(id),
display_name TEXT NOT NULL,
agent_service_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(user_id, project_id, display_name)
);
work_sessions — User-level day-of-work unit.
CREATE TABLE work_sessions (
work_session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id),
tenant_id UUID NOT NULL REFERENCES tenants(id),
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
ended_at TIMESTAMPTZ,
trigger TEXT NOT NULL,
notes TEXT
);
Renamed table
controller_registrations → agent_sessions:
agent_sessions (
agent_session_id UUID PRIMARY KEY,
agent_id UUID NOT NULL REFERENCES agents(agent_id),
work_session_id UUID NOT NULL REFERENCES work_sessions(work_session_id),
machine_id TEXT,
process_pid INT,
agent_surface TEXT NOT NULL,
is_master BOOLEAN NOT NULL DEFAULT false,
registered_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_heartbeat TIMESTAMPTZ NOT NULL DEFAULT now(),
last_verb_at TIMESTAMPTZ,
released_at TIMESTAMPTZ,
release_reason TEXT,
grpc_stream_id TEXT
);
Updated table
signal_queue:
ALTER TABLE signal_queue
ADD COLUMN org_id UUID REFERENCES orgs(org_id),
ADD COLUMN from_agent_id UUID REFERENCES agents(agent_id),
ADD COLUMN from_user_id UUID REFERENCES users(user_id),
ADD COLUMN from_agent_session UUID,
ADD COLUMN to_agent_id UUID REFERENCES agents(agent_id);
projects gains org_id:
ALTER TABLE projects ADD COLUMN org_id UUID REFERENCES orgs(org_id);
Reserved seams
agents.agent_service_id — future Enterprise Agent Services FK.
users.github_login — future GitHub-invite collab seam.
project_memberships table — NOT created; future spec when scoped.
Migration plan
Single Alembic migration, idempotent if re-run:
- Create
users, orgs, agents, work_sessions tables.
- For each existing tenant, insert one default
orgs row: (tenant_id=tenants.id, name='Personal', slug='personal').
- For each existing tenant, insert one default
users row from API key holder.
- Add
org_id to projects; backfill all rows with the default org for their tenant.
- Populate
agents table: one row per DISTINCT (project_id, agent_identity) from existing controller_registrations.
- Add new columns to
signal_queue; backfill from_agent_id/to_agent_id.
- Auto-create one
work_sessions row per user per existing distinct date observed.
- Rename
controller_registrations → agent_sessions; add agent_id FK; backfill.
- Drop legacy columns AFTER deploy verification (phased — keep through cutover, drop in follow-up).
Threat model
Hard isolation guarantees
- No cross-tenant signal bleed. Enforced at WS subscribe authorization.
- No cross-project signal bleed within a tenant. Enforced at WS subscribe authorization.
- No cross-agent signal bleed within a project. Enforced by channel naming.
- Membership is the only authorization gate for cross-tenant project access (future).
- Org-level broadcasts respect sender’s home tenant.
Soft properties (designed in)
- Audit trail completeness — every signal records sender’s home tenant, project’s tenant, agent_session.
- Revocation tear-down (future, with memberships).
- Defense in depth on the WS upgrade — explicit close codes per failure mode.
Deployment modes
| Mode | TLS required? | Cross-tenant supported? |
|---|
| Single-tenant LAN | Optional | N/A |
| Multi-tenant LAN, internal-only | Required | Yes |
| LAN-public-opt-in | Required, operator cert | Yes (operator-accepted risk) |
| Prism Cloud | Required, platform cert | Yes (canonical) |
Test plan
Unit / smoke
- WS upgrade succeeds with all required headers; fails 4002 with any missing.
- WS upgrade fails 4401 with invalid bearer; 4404 for inaccessible project.
agents UNIQUE constraint enforced.
Integration
- Send DM to self → ring file materializes, count shows unread BEFORE drain.
prism_signals_pending drains, markRead flips entries, count → 0.
- Wrap agent during signal storm, restart, observe new agent_session subscribes to same
agent:{agent_id} channel and pending signals deliver on next push.
Negative (multi-tenant isolation)
- Provision second tenant + user + project on test backend.
- Tenant A’s bearer attempts subscribe to Tenant B project channel → 4404.
- Tenant A’s
to_agent=Donna where Donna only in Tenant B → resolution fails.
- Tenant A’s
scope=org broadcast → Tenant B subscribers do NOT receive.
Postgres-level audit
- After tests,
signal_queue.delivery_method = 'ws_push' for delivered signals.
Redis PUBSUB NUMSUB confirms expected counts; 0 for forbidden channels.
Phased rollout
Single Phase. Migration is one Alembic step (with internal phases). Backend and mcp-node ship together.
Cutover sequence:
- Run migration on server1 Postgres.
- Build new backend; deploy via rsync; restart container.
- Build new mcp-node.
- Frank Cmd+Q + reopen all 4 Claude Code windows.
- Run integration test suite.
- Verify Postgres
delivery_method='ws_push'.
- After 24h soak, drop legacy columns via follow-up migration.
Out of scope
- Federation across separate Prism installations.
- Implementation of
project_memberships, GitHub-invite, Enterprise Agent Services catalogue, LAN-public install UX, Prism Cloud commercial install flow.
- Performance tuning beyond correctness.
References
- ADRs filed alongside this SPEC: hierarchy choice, agent_id normalization, agent-channel keying, two-session model, agent_service_id seam.
- Memories:
project_signal_isolation_multitenant, project_future_github_invite_collab, project_future_enterprise_agent_services, project_future_cross_internet_deployment_modes, project_spec_054_port_miss_project_id.
- Feedback memories:
feedback_routing_granularity, feedback_same_code_everywhere, feedback_optimize_later, feedback_document_port_misses.