API Architecture & Consolidation Plan¶
A comprehensive analysis of data sources, pipelines, endpoints, and a proposal for a unified, tiered API.
Executive Summary¶
cbmodels currently has three distinct endpoint categories serving different data sources: 1. i360 Statistical Analysis - Voter segment stats from i360 data 2. CC Behavioral Enrichment - Email/donation engagement from Snowflake CC data 3. Tenant Campaign Enrichment - Client-uploaded campaign data (EAV rotated)
This document proposes a unified API with clear tiering for business/pricing purposes.
Current Data Sources¶
Source Databases¶
graph TB
subgraph "Shared Data (Multi-Tenant)"
I360[(i360.db<br/>2.1GB<br/>Voter Intelligence)]
CC[(CC Data<br/>Snowflake → Parquet<br/>Email/Donations)]
end
subgraph "Tenant Data (Siloed)"
POCKET[(pocket.db<br/>Per-Tenant<br/>Contacts/Segments)]
CAMPAIGN[(campaign.db<br/>EAV Rotated<br/>Donors/Volunteers)]
end
subgraph "External Sources"
SF[Snowflake<br/>KPI Schema]
FILES[CBFiles<br/>Imports/Exports]
end
SF -->|Pipeline ETL| CC
FILES -->|List Loader| POCKET
FILES -->|Rotate Ingest| CAMPAIGN
Data Source Details¶
| Source | Type | Size | Scope | Cost Tier |
|---|---|---|---|---|
| i360.db | DuckDB | 2.1GB | Shared (symlinked) | Premium (per-state) |
| CC Data (Snowflake) | Parquet | ~500MB | Shared | Premium |
| pocket.db | DuckDB | Varies | Per-tenant | Base |
| campaign.db | DuckDB (EAV) | Varies | Per-tenant (partitioned) | Base+ |
Transformation Pipelines¶
Pipeline 1: Snowflake → CC Behavioral Data¶
flowchart LR
subgraph Snowflake
PE[phone_email_map<br/>182M rows]
EC[email_clickers<br/>41M rows]
DN[donations<br/>3.2M rows]
CM[campaigns<br/>72K rows]
end
subgraph "Pipeline ETL"
EX[Extract<br/>Chunked 1M rows]
CO[Consolidate<br/>Merge Parquet]
LD[Load<br/>DuckDB Memory]
end
subgraph "cbmodels API"
BE[behavioral enrich]
AF[behavioral affinity]
PR[behavioral propensity]
end
PE --> EX
EC --> EX
DN --> EX
CM --> EX
EX --> CO --> LD --> BE & AF & PR
CLI Commands:
# Extract from Snowflake
cbmodels pipeline run KPI.PHONE_EMAIL_MAP --db cc.db
# Or individual extraction
cbmodels extract snowflake -t KPI.EMAIL_CLICKERS -o ./data/cc/
Pipeline 2: i360 Voter Data → Stats¶
flowchart LR
subgraph "i360 Source"
VF[State Voter Files<br/>Michigan: 7.5M voters]
end
subgraph "Shared Storage"
I360[(i360.db<br/>shared storage)]
end
subgraph "Per-Tenant"
SYM[Symlink<br/>tenant db i360.db]
end
subgraph "cbmodels API"
SA[segment analyze]
BS[baseline summary]
BT[baseline table]
end
VF -->|Manual Load| I360
I360 -->|Symlink| SYM
SYM --> SA & BS & BT
Current State: - Michigan i360 data loaded - Other states available but not loaded - Premium feature (per-state pricing)
Pipeline 3: Campaign Files → EAV Rotation¶
flowchart LR
subgraph "Client Files"
CSV[CSV Files]
XLS[Excel Files]
end
subgraph "Rotation"
ROT[Rotate<br/>Wide to Tall]
NORM[Normalize<br/>Field Names]
end
subgraph "Storage"
EAV[(campaign.db<br/>tenant_id partitioned)]
end
subgraph "cbmodels API"
LU[lookup]
DU[duplicates]
EN[enrich]
end
CSV & XLS --> ROT --> NORM --> EAV
EAV --> LU & DU & EN
CLI Commands:
# Ingest campaign files for tenant
cbmodels rotate ingest ./data/sources/ -t mi20 -o campaign.db
# Lookup across sources
cbmodels rotate lookup campaign.db -t mi20 -e john@example.com
Current Endpoint Inventory¶
cbmodels API Endpoints¶
graph TB
subgraph "Segment Analysis - i360"
SA[POST segment analyze]
BS[GET baseline summary]
BT[GET baseline table]
BTL[GET baseline tables]
end
subgraph "Behavioral Enrichment - CC"
BE[POST behavioral enrich]
BA[POST behavioral affinity]
BP[POST behavioral propensity]
BST[GET behavioral status]
end
subgraph "Health"
H[GET health]
end
SA & BS & BT & BTL --> I360[(i360.db)]
BE & BA & BP & BST --> CC[(CC Parquet)]
cbapp API Endpoints (Enrichment-Related)¶
graph TB
subgraph "i360 Search - cbapp"
IS[GET i360 search]
IF[GET i360 filters]
IV[GET i360 voter by svid]
IST[GET i360 stats]
IG[GET i360 geojson]
II[POST i360 import]
ISS[POST i360 save-as-segment]
end
subgraph "AI Chat - cbapp"
CM[POST cb-chat message]
CH[GET cb-chat health]
CS[GET cb-chat suggestions]
end
subgraph "List Import - cbapp"
LU[POST list-loader upload]
LA[POST list-loader analyze]
LE[POST list-loader execute]
end
IS & IF & IV & IST & IG --> I360[(i360.db)]
CM --> I360
LE --> POCKET[(pocket.db)]
Endpoint Classification by Tier¶
Tier Matrix¶
| Tier | Name | Data Source | Cost | Features |
|---|---|---|---|---|
| 0 | Base | pocket.db only | Free | Contacts, segments, tags, basic import |
| 1 | Campaign+ | + campaign.db (rotated) | Low | EAV rotation, dedup, cross-source lookup |
| 2 | Analytics | + i360 stats | Medium | Segment analysis, correlations, patterns |
| 3 | CC Enrichment | + CC behavioral | High | Email engagement, donations, affinity |
| 4 | i360 Enrichment | + i360 voter data | High | Voter scores, demographics, voting history |
| 5 | Full Suite | All above | Premium | Complete platform |
Endpoint → Tier Mapping¶
graph LR
subgraph "Tier 0: Base"
T0A[persons]
T0B[segments]
T0C[tags]
T0D[list-loader]
end
subgraph "Tier 1: Campaign+"
T1A[campaign lookup]
T1B[campaign duplicates]
T1C[campaign sources]
T1D[campaign enrich]
end
subgraph "Tier 2: Analytics"
T2A[stats segment]
T2B[stats baseline]
T2C[stats correlations]
T2D[stats patterns]
end
subgraph "Tier 3: CC Enrichment"
T3A[enrich behavioral]
T3B[enrich affinity]
T3C[enrich propensity]
end
subgraph "Tier 4: i360"
T4A[voters search]
T4B[voters by svid]
T4C[voters import]
T4D[voters stats]
end
Proposed Unified API¶
Design Principles¶
- Consistent Naming:
/api/v1/{domain}/{action} - Tenant Context: All requests include tenant (header or JWT)
- Tier Enforcement: Middleware checks subscription tier
- Unified Response Format: Standard envelope with metadata
Proposed Endpoint Structure¶
/api/v1/
├── health # Health check (all tiers)
│
├── contacts/ # Tier 0: Base
│ ├── search
│ ├── {id}
│ └── batch
│
├── segments/ # Tier 0: Base
│ ├── list
│ ├── {id}
│ ├── {id}/execute
│ └── {id}/export
│
├── campaign/ # Tier 1: Campaign+
│ ├── sources # List ingested sources
│ ├── lookup # Find by email/phone
│ ├── duplicates # Cross-source duplicates
│ └── enrich # Add campaign data to contacts
│
├── stats/ # Tier 2: Analytics
│ ├── segment # Segment vs baseline analysis
│ ├── baseline # Baseline model info
│ ├── correlations # Column correlations
│ ├── patterns # Data patterns
│ └── outliers # Outlier detection
│
├── enrich/ # Tier 3: CC Enrichment
│ ├── behavioral # Email/donation enrichment
│ ├── affinity # Client affinity scores
│ └── propensity # Donor propensity scores
│
├── voters/ # Tier 4: i360
│ ├── search # Advanced voter search
│ ├── {svid} # Voter detail
│ ├── filters # Dynamic filter options
│ ├── import # Import to contacts
│ └── stats # Voter database stats
│
└── chat/ # Tier 4: i360 (requires voter data)
├── message # Natural language query
└── suggestions # Query suggestions
Request/Response Standards¶
Request Headers:
X-Tenant-ID: mi20 # Tenant slug (or from JWT)
X-Subscription-Tier: 3 # Tier level (or from JWT)
Authorization: Bearer <jwt> # Authentication
Response Envelope:
{
"success": true,
"data": { ... },
"meta": {
"tenant_id": "mi20",
"tier": 3,
"request_id": "uuid",
"timing_ms": 123
},
"errors": []
}
Tier Enforcement Response (403):
{
"success": false,
"data": null,
"meta": { "tenant_id": "mi20", "tier": 1 },
"errors": [{
"code": "TIER_REQUIRED",
"message": "This endpoint requires Tier 3 (CC Enrichment)",
"required_tier": 3,
"current_tier": 1
}]
}
Data Flow: Unified Architecture¶
flowchart TB
subgraph "Client Request"
REQ[API Request<br/>+ Tenant + Tier]
end
subgraph "API Gateway"
AUTH[Auth Middleware<br/>JWT Validation]
TIER[Tier Middleware<br/>Check Subscription]
TENANT[Tenant Middleware<br/>Set Context]
end
subgraph "Route Handlers"
CAMP[Campaign Routes<br/>Tier 1+]
STAT[Stats Routes<br/>Tier 2+]
ENRI[Enrich Routes<br/>Tier 3+]
VOTE[Voter Routes<br/>Tier 4+]
end
subgraph "Data Layer"
subgraph "Tenant-Scoped"
POCKET[(pocket.db)]
CAMPAIGN[(campaign.db<br/>WHERE tenant_id=?)]
end
subgraph "Shared (Read-Only)"
I360[(i360.db)]
CC[(CC Data)]
end
end
REQ --> AUTH --> TIER --> TENANT
TENANT --> CAMP & STAT & ENRI & VOTE
CAMP --> CAMPAIGN
STAT --> I360
ENRI --> CC
VOTE --> I360
CAMP & STAT & ENRI & VOTE --> POCKET
Cross-Database Query Architecture¶
DuckDB ATTACH Pattern¶
flowchart LR
subgraph "Connection"
CONN[DuckDB Connection<br/>In-Memory]
end
subgraph "Attached Databases"
A1[ATTACH pocket.db<br/>AS pocket]
A2[ATTACH campaign.db<br/>AS campaign<br/>READ_ONLY]
A3[ATTACH i360.db<br/>AS i360<br/>READ_ONLY]
end
subgraph "Cross-DB Query"
Q[SELECT p.*, c.sources, v.score<br/>FROM pocket.person p<br/>LEFT JOIN campaign.rotated c<br/> ON c.tenant_id='mi20'<br/> AND c.field='email'<br/> AND c.value=p.email<br/>LEFT JOIN i360.voters v<br/> ON v.email=p.email]
end
CONN --> A1 & A2 & A3 --> Q
Example: Unified Enrichment Query¶
-- Attach all databases
ATTACH 'pocket.db' AS pocket;
ATTACH '/opt/.../campaign.db' AS campaign (READ_ONLY);
ATTACH '/opt/.../i360.db' AS i360 (READ_ONLY);
ATTACH '/opt/.../cc.db' AS cc (READ_ONLY);
-- Unified enrichment for a segment
SELECT
p.id,
p.first_name,
p.last_name,
p.email,
-- Campaign enrichment (Tier 1)
(SELECT COUNT(DISTINCT source_id)
FROM campaign.rotated_data
WHERE tenant_id = 'mi20'
AND field_name = 'email'
AND LOWER(value) = LOWER(p.email)) as campaign_sources,
-- i360 enrichment (Tier 4)
v.party,
v.turnout_score,
v.trump_support_score,
-- CC enrichment (Tier 3)
b.click_count,
b.donation_total
FROM pocket.person p
LEFT JOIN i360.i360_voters v
ON LOWER(v.email_mydata) = LOWER(p.email)
LEFT JOIN cc.behavioral_summary b
ON LOWER(b.email) = LOWER(p.email)
WHERE p.id IN (SELECT person_id FROM pocket.segment_person WHERE segment_id = ?)
Implementation Plan¶
Phase 1: Foundation (Current Sprint)¶
gantt
title Phase 1: Foundation
dateFormat YYYY-MM-DD
section Schema
Add tenant_id to campaign.db :done, 2024-01-01, 1d
Backfill existing data :done, 2024-01-01, 1d
section Docs
Architecture document :active, 2024-01-02, 1d
section API
Tier middleware design :2024-01-03, 2d
Deliverables: - [x] Multi-tenant support (tenant_id = slug) - [x] Backfill existing data as 'mi20' - [x] Documentation (this document) - [x] Tier enforcement middleware design
Phase 2: API Consolidation ✅ COMPLETED¶
Tasks: 1. [x] Create unified route structure in cbmodels 2. [x] Add i360 voters endpoints to cbmodels 3. [x] Implement tier middleware 4. [x] Standardize response format
Implemented Route Files:
src/cbmodels/api/
├── campaign.py # Tier 1: Campaign data (4 endpoints)
├── voters.py # Tier 4: i360 voter data (5 endpoints)
├── routes.py # Tier 2-3: Stats, Behavioral (existing)
├── middleware.py # Tier enforcement middleware
├── response.py # Unified response envelope
├── metering.py # Usage tracking for billing
└── schemas.py # Pydantic models
Environment Variables:
CBMODELS_ENFORCE_TIERS=true # Enable tier access control
CBMODELS_DEFAULT_TIER=5 # Default tier if header missing
CBMODELS_UNIFIED_RESPONSE=true # Wrap responses in envelope
CBMODELS_METERING=true # Enable usage logging
CBMODELS_I360_DB=/path/to/i360.db # i360 database path
Phase 3: cbapp Integration¶
Options:
| Option | Pros | Cons |
|---|---|---|
| A: Proxy | cbapp calls cbmodels API | Extra hop, latency |
| B: Direct | cbapp uses cbmodels as library | Tight coupling |
| C: Merge | Move all enrichment to cbmodels | Larger cbmodels scope |
Recommendation: Option A (Proxy) for clean separation - cbapp handles UI, auth, tenant management - cbmodels handles data, enrichment, analytics - Clear API boundary for pricing/metering
Phase 4: Metering & Billing ✅ COMPLETED¶
Implementation: src/cbmodels/api/metering.py
- Logs all requests to JSONL file and DuckDB
- Tracks: tenant_id, endpoint, tier, timing, status
- New endpoint:
GET /api/v1/usagefor statistics
# Enable via environment variable
CBMODELS_METERING=true
# Optional custom paths
CBMODELS_METERING_DB=./data/metering.db
CBMODELS_METERING_LOG=./data/usage_logs.jsonl
Pricing Model Alignment¶
Tier Features & Pricing¶
| Tier | Monthly | Features |
|---|---|---|
| Base (0) | $0 | Contacts, segments, basic import |
| Campaign+ (1) | $49 | + EAV rotation, dedup, campaign lookup |
| Analytics (2) | $149 | + Segment stats, correlations, patterns |
| CC Enrichment (3) | $299 | + Email engagement, donations, affinity |
| i360 (4) | $499/state | + Voter scores, demographics, history |
| Full Suite (5) | $999 | All features, priority support |
Per-State i360 Pricing¶
| State | Voters | Monthly |
|---|---|---|
| Michigan | 7.5M | $499 |
| Ohio | 8.0M | $499 |
| Pennsylvania | 9.0M | $599 |
| Florida | 14.5M | $799 |
| Additional | Per request | Contact sales |
Security Considerations¶
Data Isolation¶
flowchart TB
subgraph "Request Processing"
REQ[Request] --> AUTH[Authenticate]
AUTH --> TENANT[Extract Tenant]
TENANT --> TIER[Check Tier]
end
subgraph "Query Execution"
TIER --> QUERY[Build Query]
QUERY --> INJECT[Inject tenant_id<br/>in WHERE clause]
INJECT --> EXEC[Execute]
end
subgraph "Validation"
EXEC --> VALIDATE[Validate Response<br/>No Cross-Tenant Data]
VALIDATE --> RESP[Response]
end
Query Injection Prevention¶
def build_campaign_query(tenant_id: str, email: str):
"""All queries MUST include tenant_id filter."""
return conn.execute(
"""
SELECT * FROM campaign.rotated_data
WHERE tenant_id = ? -- ALWAYS first
AND field_name = 'email'
AND LOWER(value) = LOWER(?)
""",
[tenant_id, email] # Parameterized
)
Migration Checklist¶
Immediate (Before Next Sprint) ✅¶
- Document current architecture
- Add tenant_id to campaign data
- Backfill existing data
- Review with team
- Approve tier structure
Short-Term (Next 2 Sprints) ✅¶
- Implement tier middleware (
middleware.py) - Create unified route structure (
campaign.py,voters.py) - Standardize response format (
response.py) - Add request metering (
metering.py) - Update cbapp to use new endpoints
Medium-Term (Next Quarter)¶
- Add more i360 states
- Implement usage dashboard (endpoint exists:
/api/v1/usage) - Add billing integration
- Performance optimization
- Rate limiting per tier
Open Questions¶
- Tier Storage: Where to store tenant subscription tier?
- cbtenant manager.db?
- JWT claims?
-
cbapp database?
-
i360 State Management: How to handle multi-state subscriptions?
- Separate tiers per state?
-
State list in subscription?
-
CC Data Freshness: How often to sync from Snowflake?
- Daily batch?
-
On-demand?
-
Chat Integration: Should chat require Tier 4 (i360)?
- Yes: Chat queries voter data
-
No: Could work with campaign data only
-
Endpoint Versioning: How to handle breaking changes?
- URL versioning (/api/v2/)
- Header versioning (Accept-Version)
Appendix: Current vs Proposed Endpoint Mapping¶
| Current (cbmodels) | Current (cbapp) | Proposed | Tier |
|---|---|---|---|
| POST /segment/analyze | - | POST /stats/segment | 2 |
| GET /baseline/* | - | GET /stats/baseline/* | 2 |
| POST /behavioral/enrich | - | POST /enrich/behavioral | 3 |
| POST /behavioral/affinity | - | POST /enrich/affinity | 3 |
| POST /behavioral/propensity | - | POST /enrich/propensity | 3 |
| - | GET /i360/search | GET /voters/search | 4 |
| - | GET /i360/voter/{svid} | GET /voters/{svid} | 4 |
| - | POST /i360/import | POST /voters/import | 4 |
| - | GET /i360/stats | GET /voters/stats | 4 |
| - | POST /cb-chat/message | POST /chat/message | 4 |
| (CLI only) | - | GET /campaign/lookup | 1 |
| (CLI only) | - | GET /campaign/duplicates | 1 |
| (CLI only) | - | GET /campaign/sources | 1 |
| (CLI only) | - | POST /campaign/enrich | 1 |