Multi-Tenant Architecture¶
Design for tenant-isolated data with cross-database joins.
Current State¶
Tenant Structure (from cbtenant)¶
/home/bisenbek/projects/nominate/{slug}/
├── db/
│ ├── pocket.db # Tenant-specific OLTP (contacts, segments, actions)
│ └── i360.db → shared # Symlink to shared voter data
├── .env # Tenant config
└── ...
Active Tenants: | Slug | Description | pocket.db Size | |------|-------------|----------------| | ky04 | Kentucky campaign | 5.4 GB | | mi20-clevenger | Michigan campaign | 15 MB |
Shared Resources:
| Resource | Location | Size |
|----------|----------|------|
| i360 Voter Data | /opt/campaignbrain/shared/data/i360.db | 2.1 GB |
Current cbmodels Data¶
Campaign data in ./data/rotated.db:
- 788K cells from 46 files
- NOT tenant-partitioned (all mi20 data currently)
Proposed Architecture¶
1. Tenant-Partitioned Storage¶
Add tenant_id to rotated_data table:
CREATE TABLE rotated_data (
tenant_id VARCHAR NOT NULL, -- NEW: UUID or slug
source_path VARCHAR,
source_id VARCHAR,
row_idx INTEGER,
col_idx INTEGER,
field_raw VARCHAR,
field_name VARCHAR,
value VARCHAR
);
-- Tenant isolation index (CRITICAL for query performance)
CREATE INDEX idx_rotated_tenant ON rotated_data (tenant_id);
CREATE INDEX idx_rotated_tenant_field ON rotated_data (tenant_id, field_name);
CREATE INDEX idx_rotated_tenant_value ON rotated_data (tenant_id, field_name, value);
2. Database Locations¶
/opt/campaignbrain/shared/data/
├── i360.db # Shared voter data (existing)
├── cc_kpi.db # Shared CC analytics (future)
└── cbmodels/
├── campaign.db # Multi-tenant campaign data
└── models/
├── mi20.json # Pre-computed model for mi20
└── ky04.json # Pre-computed model for ky04
Per-tenant symlink:
/home/bisenbek/projects/nominate/{slug}/db/
├── pocket.db # Tenant-owned
├── i360.db → shared # Existing symlink
└── campaign.db → shared # NEW: symlink to shared campaign.db
3. Cross-Database Query Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ DuckDB Connection │
├─────────────────────────────────────────────────────────────────┤
│ ATTACH 'pocket.db' AS pocket │
│ ATTACH '/opt/.../i360.db' AS i360 (READ_ONLY) │
│ ATTACH '/opt/.../campaign.db' AS campaign (READ_ONLY) │
└─────────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ pocket │ │ i360 │ │ campaign │
│(tenant) │ │(shared) │ │(multi-tenant)│
├──────────┤ ├──────────┤ ├──────────────┤
│ person │ │i360_voters│ │ rotated_data │
│ segment │ │ │ │ (tenant_id) │
│ tag │ │ │ │ │
│ action │ │ │ │ │
└──────────┘ └──────────┘ └──────────────┘
4. Query Patterns¶
All queries MUST include tenant_id filter:
-- Find campaign engagement for a tenant's contact
SELECT DISTINCT source_path, source_id, row_idx
FROM campaign.rotated_data
WHERE tenant_id = 'mi20' -- REQUIRED
AND field_name = 'email'
AND LOWER(value) = LOWER('john@example.com');
-- Cross-database join: contacts with campaign history
SELECT
p.first_name, p.last_name, p.email,
COUNT(DISTINCT r.source_path) as campaign_sources
FROM pocket.person p
JOIN campaign.rotated_data r
ON r.tenant_id = 'mi20' -- REQUIRED
AND r.field_name = 'email'
AND LOWER(p.email) = LOWER(r.value)
GROUP BY p.id, p.first_name, p.last_name, p.email;
-- Enrich with i360 voter data (shared, no tenant filter needed)
SELECT
p.first_name, p.last_name,
v.party, v.turnout_score, v.trump_support_score
FROM pocket.person p
LEFT JOIN i360.i360_voters v
ON LOWER(p.email) = LOWER(v.email_mydata)
WHERE p.email IS NOT NULL;
5. API Design¶
All endpoints require tenant context:
# Option A: Path parameter
GET /api/cbmodels/{tenant_id}/lookup?email=john@example.com
# Option B: Header
GET /api/cbmodels/lookup?email=john@example.com
X-Tenant-ID: mi20
# Option C: From authentication (preferred)
# JWT contains tenant_id, extracted by middleware
GET /api/cbmodels/lookup?email=john@example.com
Authorization: Bearer <jwt_with_tenant_claim>
FastAPI middleware example:
from fastapi import Request, HTTPException
async def tenant_middleware(request: Request, call_next):
# Extract from JWT or header
tenant_id = request.headers.get("X-Tenant-ID")
if not tenant_id:
tenant_id = get_tenant_from_jwt(request)
if not tenant_id:
raise HTTPException(401, "Tenant context required")
# Attach to request state
request.state.tenant_id = tenant_id
return await call_next(request)
6. Data Ingestion with Tenant¶
def ingest_for_tenant(
tenant_id: str,
source_dir: Path,
campaign_db: Path = Path("/opt/campaignbrain/shared/data/cbmodels/campaign.db")
):
"""Ingest campaign files for a specific tenant."""
conn = duckdb.connect(str(campaign_db))
# Ensure table exists with tenant_id
create_rotated_table_v2(conn) # Includes tenant_id column
for file in source_dir.glob("**/*.csv"):
for row in rotate_file(file):
row["tenant_id"] = tenant_id # Tag with tenant
# ... insert
CLI:
# Ingest for specific tenant
cbmodels rotate ingest ./data/sources/ --tenant mi20 --output-db /opt/.../campaign.db
# Query for tenant
cbmodels rotate lookup /opt/.../campaign.db --tenant mi20 --email john@example.com
7. Shared vs Tenant-Specific Data¶
| Data Type | Isolation | Location | Access Pattern |
|---|---|---|---|
| i360 Voters | Shared (read-only) | /opt/.../i360.db |
Direct query, no tenant filter |
| CC Analytics | Shared (read-only) | /opt/.../cc_kpi.db |
Direct query, no tenant filter |
| Campaign Data | Tenant-partitioned | /opt/.../campaign.db |
Always filter by tenant_id |
| Contacts | Tenant-owned | {slug}/db/pocket.db |
Implicit tenant via database file |
| Segments | Tenant-owned | {slug}/db/pocket.db |
Implicit tenant via database file |
8. Migration Path¶
Phase 1: Add tenant_id to existing data
-- Add column to existing table
ALTER TABLE rotated_data ADD COLUMN tenant_id VARCHAR;
-- Backfill existing data (all currently mi20)
UPDATE rotated_data SET tenant_id = 'mi20' WHERE tenant_id IS NULL;
-- Make column NOT NULL
-- (DuckDB doesn't support ALTER COLUMN, may need table recreation)
Phase 2: Move database to shared location
# Move to shared location
mv ./data/rotated.db /opt/campaignbrain/shared/data/cbmodels/campaign.db
# Create symlink in each tenant
ln -s /opt/campaignbrain/shared/data/cbmodels/campaign.db \
/home/bisenbek/projects/nominate/mi20-clevenger/db/campaign.db
Phase 3: Update cbapp to use ATTACH
9. Security Considerations¶
Row-Level Security via Application:
- DuckDB doesn't have native RLS
- All queries MUST include WHERE tenant_id = ?
- Enforce at API layer, not database layer
Validation:
def validate_tenant_access(user_tenant: str, query_tenant: str):
"""Ensure user can only access their tenant's data."""
if user_tenant != query_tenant:
raise PermissionError(f"Access denied to tenant {query_tenant}")
Audit Trail:
CREATE TABLE audit_log (
id VARCHAR PRIMARY KEY,
tenant_id VARCHAR NOT NULL,
user_id VARCHAR,
action VARCHAR,
resource VARCHAR,
query VARCHAR,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
10. Chat Integration Update¶
Update CBAPP-INTEGRATION.md endpoints to include tenant:
# Lookup with tenant context
@router.get("/lookup")
async def lookup_contact(
request: Request,
email: str = None,
phone: str = None,
):
tenant_id = request.state.tenant_id # From middleware
# Query with tenant isolation
matches = find_by_email(conn, tenant_id, email)
return {"matches": matches}
QueryIR Extension:
{
"type": "campaign_lookup",
"tenant_id": "mi20", // From request context
"email": "john@example.com"
}
Implementation Checklist¶
- Add
tenant_idcolumn torotated_dataschema - Update
storage.pyfunctions to require tenant_id - Update
cli.pywith--tenantflag - Migrate existing data to tenant 'mi20'
- Move database to shared location
- Create symlinks for active tenants
- Add tenant middleware to cbapp
- Update chat integration to pass tenant context
- Add audit logging for tenant access
- Update documentation
Decisions¶
- Tenant ID format: Slug (e.g.,
mi20,ky04) - Human-readable in logs and queries
- Consistent with file paths (
/home/.../mi20-clevenger/) - Consistent with systemd services (
mi20-api,ky04-frontend) -
Easy to debug:
WHERE tenant_id = 'mi20' -
Database file naming:
campaign.dbvs tenant-specific files? - Single multi-tenant file: Simpler, one ATTACH
-
Per-tenant files: Better isolation, but more complexity
-
Shared data updates: How to refresh i360/cc data?
- Currently manual
-
Could add Snowflake sync job
-
Cache invalidation: When campaign data changes, how to notify chat?
- Currently no caching
- May need pub/sub for real-time updates