Skip to main content
Status: accepted · Version 1 · Filed 2026-04-30

spec_id: SPEC-056 version: 1 status: accepted authored_by: Donna date: 2026-04-30

SPEC-056 — Multi-tenant Routing, Identity, and Signal Isolation

Status

Accepted — 2026-04-30. Authored Donna; reviewed Frank.

Problem

The signal-routing layer was designed for single-tenant use and conflates several distinct concepts that must be separable for multi-tenant operation:
  1. Routing target = process binding. Today’s prism:events:session:{session_id} channel is keyed on a process-lifetime ID. When an agent wraps and restarts, the session_id changes; pending signals for the old session deliver via piggyback (HTTP poll) but the per-identity cache writer (signalCache.record()) on the WebSocket subscriber path never fires. Statusline bell+count is always 0.
  2. Persona name treated as identity. controller_registrations.agent_identity is a string; uniqueness is (tenant_id, project_id, agent_identity). Persona renames break identity continuity. Cross-tenant collaborators with name collisions cannot coexist.
  3. No Org level between Tenant and Project. Tenant and Project are the only routing-scope levels. Departments / business units inside a tenant cannot be addressed as a routing scope.
  4. No User/Agent-Owner concept. Two humans collaborating on the same Project cannot each have a persona with the same display name.
  5. No User Work Session concept. No daily-work boundary; analytics, retros, and cross-machine continuity have no natural grouping.
  6. Latent bug discovered 2026-04-30. SPEC-054 Phase 3 port flattened response.registration.project_id to response.project_id, dropped Python’s defensive guard, sent empty X-Prism-Project-Id header → backend WS rejects (close 4002 → uvicorn 403) → Redis pubsub has 0 subscribers → publish_session_event publishes into the void → signalCache.record() never fires → bell+count permanently dark. Documented in project_spec_054_port_miss_project_id memory.

Goals

  • Hard tenant isolation: zero cross-tenant signal bleed under any scenario.
  • Hard project isolation: zero cross-project signal bleed within a tenant.
  • Hard session isolation: DMs to one agent never reach another agent.
  • Routing continuity across restarts: same agent_id, same channel, signals delivered on next subscribe.
  • Same code path for all deployment modes (Personal, Multi-tenant LAN, LAN-public-opt-in, Cloud) — no local-mode shortcuts.
  • Schema and routing layer must reserve seams for future capabilities (project_memberships, Enterprise Agent Services catalogue, GitHub-invite cross-network collab, deployment-mode TLS posture) without re-plumbing later.

Non-Goals

  • Federation across separate Prism installations. Explicitly out of scope.
  • Implementing the future capabilities themselves (cross-tenant memberships, agent catalogue, GitHub-invite) — only reserving the seams.
  • Performance optimization beyond what falls naturally out of correctness.

Architecture

Hierarchy (4 levels, all distinct)

Enterprise / Tenant      (top — billing/contract entity)
    └── Org              (department or logical sub-grouping)
        └── Project      (working scope)
            └── Agent    (active worker / persona)
Each parent has many children. Each level is addressable as a routing scope.

Identity model

Agent is the stable routing referent. agent_id (UUID) is immutable and survives:
  • Persona display-name changes
  • prism_start / prism_wrap cycles (agent_session churn)
  • Machine moves
  • Cross-tenant project-membership grants (future)
Display name is presentation only. Server resolves display name → agent_id at signal-send time within the sender’s project scope. User (= Agent Owner) is the human who created the persona. Today’s Personal Install has one User (Frank). Multi-user-per-tenant scenarios are reserved for future but the schema carries user_id from day one.

Two-session model

Work Session (User-level):
  • One row per User per day’s-work unit.
  • Auto-created on first agent_session of the day for a given user.
  • Ends explicitly (prism_work_session_end) or rolls over at midnight.
  • Can span multiple machines.
  • Used for: daily roll-up analytics, retros, signal history grouping.
  • Not a routing filter.
Agent Session (Process-level):
  • One row per prism_startprism_wrap cycle for one agent on one machine.
  • Bound to a specific machine + process.
  • Heartbeat-tracked for liveness.
  • Not a routing filter — channel naming uses agent_id, not agent_session_id.
agent_sessions.work_session_id FK ties each process-level session to the user’s day-of-work.

Channel naming

prism:events:tenant:{tenant_id}                          # tenant-wide broadcasts
prism:events:org:{tenant_id}:{org_id}                    # org-wide broadcasts
prism:events:project:{tenant_id}:{org_id}:{project_id}   # project-wide broadcasts
prism:events:agent:{agent_id}                            # DMs to a specific agent
Every parent identifier is qualified into the child’s channel name. The agent channel is the critical change from today: keying on agent_id (stable) instead of session_id (transient) means signals queued during an agent’s wrap window deliver on the next subscribe.

Routing rules

Subscription authorization — a session may subscribe to a channel iff:
ChannelAuthorization rule
tenant:{T}bearer’s tenant == T
org:{T}:{O}bearer’s tenant == T AND user is member of org O
project:{T}:{O}:{P}bearer’s tenant == T AND user has access to project P (today: same tenant; future: cross-tenant via memberships)
agent:{A}session’s agent_id == A (always self-only)
Publish-side rules (sender’s tenant is sticky):
ReachPublishes toTrust property
scope=tenanttenant:{sender_tenant}Bounded by sender’s home tenant
scope=orgorg:{sender_tenant}:{sender_org}Bounded by sender’s home org
scope=projectproject:{tenant}:{org}:{project}Reaches all project members regardless of home tenant (future memberships)
scope=direct (to_agent=X)agent:{resolved_agent_id}Resolved within sender’s project scope
Address resolution for to_agent="display_name":
  1. Look up agents WHERE project_id = sender.project_id AND display_name = name.
  2. If exactly one match → that’s the target.
  3. If multiple matches (different user_id) → prefer sender’s own user_id; else error.
  4. Cross-user disambig syntax: to_agent="display_name@user_login" (TBD).
Project-scoped routing rule (the hard rule): an agent can only address other agents within the same project. Cross-project DMs are not supported.

WebSocket handshake header set

Authorization: Bearer {api_key}
X-Prism-Tenant-Id: {tenant_id}
X-Prism-Org-Id: {org_id}
X-Prism-Project-Id: {project_id}
X-Prism-User-Id: {user_id}
X-Prism-Agent-Id: {agent_id}
X-Prism-Agent-Session-Id: {agent_session_id}
X-Prism-Work-Session-Id: {work_session_id}
X-Prism-Last-Event-Id: {event_id}
Defensive guard (lesson from SPEC-054 port miss): the bootstrap layer must skip stream spawn entirely if any required ID is missing. Empty headers cause backend close 4002 + reconnect storm; missing IDs are a registration bug to surface, not a transient to retry.

Schema

New tables

users — humans (or external collaborators) who own agents.
CREATE TABLE users (
    user_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id      UUID NOT NULL REFERENCES tenants(id),
    email          TEXT NOT NULL,
    github_login   TEXT,
    display_name   TEXT NOT NULL,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(tenant_id, email)
);
orgs — departmental sub-grouping inside a tenant.
CREATE TABLE orgs (
    org_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id      UUID NOT NULL REFERENCES tenants(id),
    name           TEXT NOT NULL,
    slug           TEXT NOT NULL,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(tenant_id, slug)
);
agents — stable persona identity.
CREATE TABLE agents (
    agent_id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id            UUID NOT NULL REFERENCES users(user_id),
    tenant_id          UUID NOT NULL REFERENCES tenants(id),
    org_id             UUID NOT NULL REFERENCES orgs(org_id),
    project_id         UUID NOT NULL REFERENCES projects(id),
    display_name       TEXT NOT NULL,
    agent_service_id   UUID,
    created_at         TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(user_id, project_id, display_name)
);
work_sessions — User-level day-of-work unit.
CREATE TABLE work_sessions (
    work_session_id    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id            UUID NOT NULL REFERENCES users(user_id),
    tenant_id          UUID NOT NULL REFERENCES tenants(id),
    started_at         TIMESTAMPTZ NOT NULL DEFAULT now(),
    ended_at           TIMESTAMPTZ,
    trigger            TEXT NOT NULL,
    notes              TEXT
);

Renamed table

controller_registrationsagent_sessions:
agent_sessions (
    agent_session_id   UUID PRIMARY KEY,
    agent_id           UUID NOT NULL REFERENCES agents(agent_id),
    work_session_id    UUID NOT NULL REFERENCES work_sessions(work_session_id),
    machine_id         TEXT,
    process_pid        INT,
    agent_surface      TEXT NOT NULL,
    is_master          BOOLEAN NOT NULL DEFAULT false,
    registered_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    last_heartbeat     TIMESTAMPTZ NOT NULL DEFAULT now(),
    last_verb_at       TIMESTAMPTZ,
    released_at        TIMESTAMPTZ,
    release_reason     TEXT,
    grpc_stream_id     TEXT
);

Updated table

signal_queue:
ALTER TABLE signal_queue
    ADD COLUMN org_id            UUID REFERENCES orgs(org_id),
    ADD COLUMN from_agent_id     UUID REFERENCES agents(agent_id),
    ADD COLUMN from_user_id      UUID REFERENCES users(user_id),
    ADD COLUMN from_agent_session UUID,
    ADD COLUMN to_agent_id       UUID REFERENCES agents(agent_id);
projects gains org_id:
ALTER TABLE projects ADD COLUMN org_id UUID REFERENCES orgs(org_id);

Reserved seams

  • agents.agent_service_id — future Enterprise Agent Services FK.
  • users.github_login — future GitHub-invite collab seam.
  • project_memberships table — NOT created; future spec when scoped.

Migration plan

Single Alembic migration, idempotent if re-run:
  1. Create users, orgs, agents, work_sessions tables.
  2. For each existing tenant, insert one default orgs row: (tenant_id=tenants.id, name='Personal', slug='personal').
  3. For each existing tenant, insert one default users row from API key holder.
  4. Add org_id to projects; backfill all rows with the default org for their tenant.
  5. Populate agents table: one row per DISTINCT (project_id, agent_identity) from existing controller_registrations.
  6. Add new columns to signal_queue; backfill from_agent_id/to_agent_id.
  7. Auto-create one work_sessions row per user per existing distinct date observed.
  8. Rename controller_registrationsagent_sessions; add agent_id FK; backfill.
  9. Drop legacy columns AFTER deploy verification (phased — keep through cutover, drop in follow-up).

Threat model

Hard isolation guarantees

  1. No cross-tenant signal bleed. Enforced at WS subscribe authorization.
  2. No cross-project signal bleed within a tenant. Enforced at WS subscribe authorization.
  3. No cross-agent signal bleed within a project. Enforced by channel naming.
  4. Membership is the only authorization gate for cross-tenant project access (future).
  5. Org-level broadcasts respect sender’s home tenant.

Soft properties (designed in)

  1. Audit trail completeness — every signal records sender’s home tenant, project’s tenant, agent_session.
  2. Revocation tear-down (future, with memberships).
  3. Defense in depth on the WS upgrade — explicit close codes per failure mode.

Deployment modes

ModeTLS required?Cross-tenant supported?
Single-tenant LANOptionalN/A
Multi-tenant LAN, internal-onlyRequiredYes
LAN-public-opt-inRequired, operator certYes (operator-accepted risk)
Prism CloudRequired, platform certYes (canonical)

Test plan

Unit / smoke

  • WS upgrade succeeds with all required headers; fails 4002 with any missing.
  • WS upgrade fails 4401 with invalid bearer; 4404 for inaccessible project.
  • agents UNIQUE constraint enforced.

Integration

  • Send DM to self → ring file materializes, count shows unread BEFORE drain.
  • prism_signals_pending drains, markRead flips entries, count → 0.
  • Wrap agent during signal storm, restart, observe new agent_session subscribes to same agent:{agent_id} channel and pending signals deliver on next push.

Negative (multi-tenant isolation)

  • Provision second tenant + user + project on test backend.
  • Tenant A’s bearer attempts subscribe to Tenant B project channel → 4404.
  • Tenant A’s to_agent=Donna where Donna only in Tenant B → resolution fails.
  • Tenant A’s scope=org broadcast → Tenant B subscribers do NOT receive.

Postgres-level audit

  • After tests, signal_queue.delivery_method = 'ws_push' for delivered signals.
  • Redis PUBSUB NUMSUB confirms expected counts; 0 for forbidden channels.

Phased rollout

Single Phase. Migration is one Alembic step (with internal phases). Backend and mcp-node ship together. Cutover sequence:
  1. Run migration on server1 Postgres.
  2. Build new backend; deploy via rsync; restart container.
  3. Build new mcp-node.
  4. Frank Cmd+Q + reopen all 4 Claude Code windows.
  5. Run integration test suite.
  6. Verify Postgres delivery_method='ws_push'.
  7. After 24h soak, drop legacy columns via follow-up migration.

Out of scope

  • Federation across separate Prism installations.
  • Implementation of project_memberships, GitHub-invite, Enterprise Agent Services catalogue, LAN-public install UX, Prism Cloud commercial install flow.
  • Performance tuning beyond correctness.

References

  • ADRs filed alongside this SPEC: hierarchy choice, agent_id normalization, agent-channel keying, two-session model, agent_service_id seam.
  • Memories: project_signal_isolation_multitenant, project_future_github_invite_collab, project_future_enterprise_agent_services, project_future_cross_internet_deployment_modes, project_spec_054_port_miss_project_id.
  • Feedback memories: feedback_routing_granularity, feedback_same_code_everywhere, feedback_optimize_later, feedback_document_port_misses.
Last modified on May 3, 2026