Skip to content

Multi-Tenant Architecture

Design for tenant-isolated data with cross-database joins.

Current State

Tenant Structure (from cbtenant)

/home/bisenbek/projects/nominate/{slug}/
├── db/
│   ├── pocket.db         # Tenant-specific OLTP (contacts, segments, actions)
│   └── i360.db → shared  # Symlink to shared voter data
├── .env                  # Tenant config
└── ...

Active Tenants: | Slug | Description | pocket.db Size | |------|-------------|----------------| | ky04 | Kentucky campaign | 5.4 GB | | mi20-clevenger | Michigan campaign | 15 MB |

Shared Resources: | Resource | Location | Size | |----------|----------|------| | i360 Voter Data | /opt/campaignbrain/shared/data/i360.db | 2.1 GB |

Current cbmodels Data

Campaign data in ./data/rotated.db: - 788K cells from 46 files - NOT tenant-partitioned (all mi20 data currently)

Proposed Architecture

1. Tenant-Partitioned Storage

Add tenant_id to rotated_data table:

CREATE TABLE rotated_data (
    tenant_id VARCHAR NOT NULL,     -- NEW: UUID or slug
    source_path VARCHAR,
    source_id VARCHAR,
    row_idx INTEGER,
    col_idx INTEGER,
    field_raw VARCHAR,
    field_name VARCHAR,
    value VARCHAR
);

-- Tenant isolation index (CRITICAL for query performance)
CREATE INDEX idx_rotated_tenant ON rotated_data (tenant_id);
CREATE INDEX idx_rotated_tenant_field ON rotated_data (tenant_id, field_name);
CREATE INDEX idx_rotated_tenant_value ON rotated_data (tenant_id, field_name, value);

2. Database Locations

/opt/campaignbrain/shared/data/
├── i360.db                    # Shared voter data (existing)
├── cc_kpi.db                  # Shared CC analytics (future)
└── cbmodels/
    ├── campaign.db            # Multi-tenant campaign data
    └── models/
        ├── mi20.json          # Pre-computed model for mi20
        └── ky04.json          # Pre-computed model for ky04

Per-tenant symlink:

/home/bisenbek/projects/nominate/{slug}/db/
├── pocket.db                  # Tenant-owned
├── i360.db → shared           # Existing symlink
└── campaign.db → shared       # NEW: symlink to shared campaign.db

3. Cross-Database Query Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      DuckDB Connection                           │
├─────────────────────────────────────────────────────────────────┤
│  ATTACH 'pocket.db' AS pocket                                    │
│  ATTACH '/opt/.../i360.db' AS i360 (READ_ONLY)                  │
│  ATTACH '/opt/.../campaign.db' AS campaign (READ_ONLY)          │
└─────────────────────────────────────────────────────────────────┘
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
   ┌──────────┐        ┌──────────┐        ┌──────────────┐
   │  pocket  │        │  i360    │        │  campaign    │
   │(tenant)  │        │(shared)  │        │(multi-tenant)│
   ├──────────┤        ├──────────┤        ├──────────────┤
   │ person   │        │i360_voters│       │ rotated_data │
   │ segment  │        │          │        │ (tenant_id)  │
   │ tag      │        │          │        │              │
   │ action   │        │          │        │              │
   └──────────┘        └──────────┘        └──────────────┘

4. Query Patterns

All queries MUST include tenant_id filter:

-- Find campaign engagement for a tenant's contact
SELECT DISTINCT source_path, source_id, row_idx
FROM campaign.rotated_data
WHERE tenant_id = 'mi20'                    -- REQUIRED
  AND field_name = 'email'
  AND LOWER(value) = LOWER('john@example.com');

-- Cross-database join: contacts with campaign history
SELECT
    p.first_name, p.last_name, p.email,
    COUNT(DISTINCT r.source_path) as campaign_sources
FROM pocket.person p
JOIN campaign.rotated_data r
    ON r.tenant_id = 'mi20'                 -- REQUIRED
    AND r.field_name = 'email'
    AND LOWER(p.email) = LOWER(r.value)
GROUP BY p.id, p.first_name, p.last_name, p.email;

-- Enrich with i360 voter data (shared, no tenant filter needed)
SELECT
    p.first_name, p.last_name,
    v.party, v.turnout_score, v.trump_support_score
FROM pocket.person p
LEFT JOIN i360.i360_voters v
    ON LOWER(p.email) = LOWER(v.email_mydata)
WHERE p.email IS NOT NULL;

5. API Design

All endpoints require tenant context:

# Option A: Path parameter
GET /api/cbmodels/{tenant_id}/lookup?email=john@example.com

# Option B: Header
GET /api/cbmodels/lookup?email=john@example.com
X-Tenant-ID: mi20

# Option C: From authentication (preferred)
# JWT contains tenant_id, extracted by middleware
GET /api/cbmodels/lookup?email=john@example.com
Authorization: Bearer <jwt_with_tenant_claim>

FastAPI middleware example:

from fastapi import Request, HTTPException

async def tenant_middleware(request: Request, call_next):
    # Extract from JWT or header
    tenant_id = request.headers.get("X-Tenant-ID")
    if not tenant_id:
        tenant_id = get_tenant_from_jwt(request)

    if not tenant_id:
        raise HTTPException(401, "Tenant context required")

    # Attach to request state
    request.state.tenant_id = tenant_id
    return await call_next(request)

6. Data Ingestion with Tenant

def ingest_for_tenant(
    tenant_id: str,
    source_dir: Path,
    campaign_db: Path = Path("/opt/campaignbrain/shared/data/cbmodels/campaign.db")
):
    """Ingest campaign files for a specific tenant."""
    conn = duckdb.connect(str(campaign_db))

    # Ensure table exists with tenant_id
    create_rotated_table_v2(conn)  # Includes tenant_id column

    for file in source_dir.glob("**/*.csv"):
        for row in rotate_file(file):
            row["tenant_id"] = tenant_id  # Tag with tenant
            # ... insert

CLI:

# Ingest for specific tenant
cbmodels rotate ingest ./data/sources/ --tenant mi20 --output-db /opt/.../campaign.db

# Query for tenant
cbmodels rotate lookup /opt/.../campaign.db --tenant mi20 --email john@example.com

7. Shared vs Tenant-Specific Data

Data Type Isolation Location Access Pattern
i360 Voters Shared (read-only) /opt/.../i360.db Direct query, no tenant filter
CC Analytics Shared (read-only) /opt/.../cc_kpi.db Direct query, no tenant filter
Campaign Data Tenant-partitioned /opt/.../campaign.db Always filter by tenant_id
Contacts Tenant-owned {slug}/db/pocket.db Implicit tenant via database file
Segments Tenant-owned {slug}/db/pocket.db Implicit tenant via database file

8. Migration Path

Phase 1: Add tenant_id to existing data

-- Add column to existing table
ALTER TABLE rotated_data ADD COLUMN tenant_id VARCHAR;

-- Backfill existing data (all currently mi20)
UPDATE rotated_data SET tenant_id = 'mi20' WHERE tenant_id IS NULL;

-- Make column NOT NULL
-- (DuckDB doesn't support ALTER COLUMN, may need table recreation)

Phase 2: Move database to shared location

# Move to shared location
mv ./data/rotated.db /opt/campaignbrain/shared/data/cbmodels/campaign.db

# Create symlink in each tenant
ln -s /opt/campaignbrain/shared/data/cbmodels/campaign.db \
      /home/bisenbek/projects/nominate/mi20-clevenger/db/campaign.db

Phase 3: Update cbapp to use ATTACH

# In cbapp startup
conn.execute("ATTACH '/opt/.../campaign.db' AS campaign (READ_ONLY)")

9. Security Considerations

Row-Level Security via Application: - DuckDB doesn't have native RLS - All queries MUST include WHERE tenant_id = ? - Enforce at API layer, not database layer

Validation:

def validate_tenant_access(user_tenant: str, query_tenant: str):
    """Ensure user can only access their tenant's data."""
    if user_tenant != query_tenant:
        raise PermissionError(f"Access denied to tenant {query_tenant}")

Audit Trail:

CREATE TABLE audit_log (
    id VARCHAR PRIMARY KEY,
    tenant_id VARCHAR NOT NULL,
    user_id VARCHAR,
    action VARCHAR,
    resource VARCHAR,
    query VARCHAR,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

10. Chat Integration Update

Update CBAPP-INTEGRATION.md endpoints to include tenant:

# Lookup with tenant context
@router.get("/lookup")
async def lookup_contact(
    request: Request,
    email: str = None,
    phone: str = None,
):
    tenant_id = request.state.tenant_id  # From middleware

    # Query with tenant isolation
    matches = find_by_email(conn, tenant_id, email)
    return {"matches": matches}

QueryIR Extension:

{
  "type": "campaign_lookup",
  "tenant_id": "mi20",          // From request context
  "email": "john@example.com"
}

Implementation Checklist

  • Add tenant_id column to rotated_data schema
  • Update storage.py functions to require tenant_id
  • Update cli.py with --tenant flag
  • Migrate existing data to tenant 'mi20'
  • Move database to shared location
  • Create symlinks for active tenants
  • Add tenant middleware to cbapp
  • Update chat integration to pass tenant context
  • Add audit logging for tenant access
  • Update documentation

Decisions

  1. Tenant ID format: Slug (e.g., mi20, ky04)
  2. Human-readable in logs and queries
  3. Consistent with file paths (/home/.../mi20-clevenger/)
  4. Consistent with systemd services (mi20-api, ky04-frontend)
  5. Easy to debug: WHERE tenant_id = 'mi20'

  6. Database file naming: campaign.db vs tenant-specific files?

  7. Single multi-tenant file: Simpler, one ATTACH
  8. Per-tenant files: Better isolation, but more complexity

  9. Shared data updates: How to refresh i360/cc data?

  10. Currently manual
  11. Could add Snowflake sync job

  12. Cache invalidation: When campaign data changes, how to notify chat?

  13. Currently no caching
  14. May need pub/sub for real-time updates