Skip to content

API Architecture & Consolidation Plan

A comprehensive analysis of data sources, pipelines, endpoints, and a proposal for a unified, tiered API.

Executive Summary

cbmodels currently has three distinct endpoint categories serving different data sources: 1. i360 Statistical Analysis - Voter segment stats from i360 data 2. CC Behavioral Enrichment - Email/donation engagement from Snowflake CC data 3. Tenant Campaign Enrichment - Client-uploaded campaign data (EAV rotated)

This document proposes a unified API with clear tiering for business/pricing purposes.


Current Data Sources

Source Databases

graph TB
    subgraph "Shared Data (Multi-Tenant)"
        I360[(i360.db<br/>2.1GB<br/>Voter Intelligence)]
        CC[(CC Data<br/>Snowflake → Parquet<br/>Email/Donations)]
    end

    subgraph "Tenant Data (Siloed)"
        POCKET[(pocket.db<br/>Per-Tenant<br/>Contacts/Segments)]
        CAMPAIGN[(campaign.db<br/>EAV Rotated<br/>Donors/Volunteers)]
    end

    subgraph "External Sources"
        SF[Snowflake<br/>KPI Schema]
        FILES[CBFiles<br/>Imports/Exports]
    end

    SF -->|Pipeline ETL| CC
    FILES -->|List Loader| POCKET
    FILES -->|Rotate Ingest| CAMPAIGN

Data Source Details

Source Type Size Scope Cost Tier
i360.db DuckDB 2.1GB Shared (symlinked) Premium (per-state)
CC Data (Snowflake) Parquet ~500MB Shared Premium
pocket.db DuckDB Varies Per-tenant Base
campaign.db DuckDB (EAV) Varies Per-tenant (partitioned) Base+

Transformation Pipelines

Pipeline 1: Snowflake → CC Behavioral Data

flowchart LR
    subgraph Snowflake
        PE[phone_email_map<br/>182M rows]
        EC[email_clickers<br/>41M rows]
        DN[donations<br/>3.2M rows]
        CM[campaigns<br/>72K rows]
    end

    subgraph "Pipeline ETL"
        EX[Extract<br/>Chunked 1M rows]
        CO[Consolidate<br/>Merge Parquet]
        LD[Load<br/>DuckDB Memory]
    end

    subgraph "cbmodels API"
        BE[behavioral enrich]
        AF[behavioral affinity]
        PR[behavioral propensity]
    end

    PE --> EX
    EC --> EX
    DN --> EX
    CM --> EX
    EX --> CO --> LD --> BE & AF & PR

CLI Commands:

# Extract from Snowflake
cbmodels pipeline run KPI.PHONE_EMAIL_MAP --db cc.db

# Or individual extraction
cbmodels extract snowflake -t KPI.EMAIL_CLICKERS -o ./data/cc/

Pipeline 2: i360 Voter Data → Stats

flowchart LR
    subgraph "i360 Source"
        VF[State Voter Files<br/>Michigan: 7.5M voters]
    end

    subgraph "Shared Storage"
        I360[(i360.db<br/>shared storage)]
    end

    subgraph "Per-Tenant"
        SYM[Symlink<br/>tenant db i360.db]
    end

    subgraph "cbmodels API"
        SA[segment analyze]
        BS[baseline summary]
        BT[baseline table]
    end

    VF -->|Manual Load| I360
    I360 -->|Symlink| SYM
    SYM --> SA & BS & BT

Current State: - Michigan i360 data loaded - Other states available but not loaded - Premium feature (per-state pricing)

Pipeline 3: Campaign Files → EAV Rotation

flowchart LR
    subgraph "Client Files"
        CSV[CSV Files]
        XLS[Excel Files]
    end

    subgraph "Rotation"
        ROT[Rotate<br/>Wide to Tall]
        NORM[Normalize<br/>Field Names]
    end

    subgraph "Storage"
        EAV[(campaign.db<br/>tenant_id partitioned)]
    end

    subgraph "cbmodels API"
        LU[lookup]
        DU[duplicates]
        EN[enrich]
    end

    CSV & XLS --> ROT --> NORM --> EAV
    EAV --> LU & DU & EN

CLI Commands:

# Ingest campaign files for tenant
cbmodels rotate ingest ./data/sources/ -t mi20 -o campaign.db

# Lookup across sources
cbmodels rotate lookup campaign.db -t mi20 -e john@example.com


Current Endpoint Inventory

cbmodels API Endpoints

graph TB
    subgraph "Segment Analysis - i360"
        SA[POST segment analyze]
        BS[GET baseline summary]
        BT[GET baseline table]
        BTL[GET baseline tables]
    end

    subgraph "Behavioral Enrichment - CC"
        BE[POST behavioral enrich]
        BA[POST behavioral affinity]
        BP[POST behavioral propensity]
        BST[GET behavioral status]
    end

    subgraph "Health"
        H[GET health]
    end

    SA & BS & BT & BTL --> I360[(i360.db)]
    BE & BA & BP & BST --> CC[(CC Parquet)]
graph TB
    subgraph "i360 Search - cbapp"
        IS[GET i360 search]
        IF[GET i360 filters]
        IV[GET i360 voter by svid]
        IST[GET i360 stats]
        IG[GET i360 geojson]
        II[POST i360 import]
        ISS[POST i360 save-as-segment]
    end

    subgraph "AI Chat - cbapp"
        CM[POST cb-chat message]
        CH[GET cb-chat health]
        CS[GET cb-chat suggestions]
    end

    subgraph "List Import - cbapp"
        LU[POST list-loader upload]
        LA[POST list-loader analyze]
        LE[POST list-loader execute]
    end

    IS & IF & IV & IST & IG --> I360[(i360.db)]
    CM --> I360
    LE --> POCKET[(pocket.db)]

Endpoint Classification by Tier

Tier Matrix

Tier Name Data Source Cost Features
0 Base pocket.db only Free Contacts, segments, tags, basic import
1 Campaign+ + campaign.db (rotated) Low EAV rotation, dedup, cross-source lookup
2 Analytics + i360 stats Medium Segment analysis, correlations, patterns
3 CC Enrichment + CC behavioral High Email engagement, donations, affinity
4 i360 Enrichment + i360 voter data High Voter scores, demographics, voting history
5 Full Suite All above Premium Complete platform

Endpoint → Tier Mapping

graph LR
    subgraph "Tier 0: Base"
        T0A[persons]
        T0B[segments]
        T0C[tags]
        T0D[list-loader]
    end

    subgraph "Tier 1: Campaign+"
        T1A[campaign lookup]
        T1B[campaign duplicates]
        T1C[campaign sources]
        T1D[campaign enrich]
    end

    subgraph "Tier 2: Analytics"
        T2A[stats segment]
        T2B[stats baseline]
        T2C[stats correlations]
        T2D[stats patterns]
    end

    subgraph "Tier 3: CC Enrichment"
        T3A[enrich behavioral]
        T3B[enrich affinity]
        T3C[enrich propensity]
    end

    subgraph "Tier 4: i360"
        T4A[voters search]
        T4B[voters by svid]
        T4C[voters import]
        T4D[voters stats]
    end

Proposed Unified API

Design Principles

  1. Consistent Naming: /api/v1/{domain}/{action}
  2. Tenant Context: All requests include tenant (header or JWT)
  3. Tier Enforcement: Middleware checks subscription tier
  4. Unified Response Format: Standard envelope with metadata

Proposed Endpoint Structure

/api/v1/
├── health                    # Health check (all tiers)
├── contacts/                 # Tier 0: Base
│   ├── search
│   ├── {id}
│   └── batch
├── segments/                 # Tier 0: Base
│   ├── list
│   ├── {id}
│   ├── {id}/execute
│   └── {id}/export
├── campaign/                 # Tier 1: Campaign+
│   ├── sources              # List ingested sources
│   ├── lookup               # Find by email/phone
│   ├── duplicates           # Cross-source duplicates
│   └── enrich               # Add campaign data to contacts
├── stats/                    # Tier 2: Analytics
│   ├── segment              # Segment vs baseline analysis
│   ├── baseline             # Baseline model info
│   ├── correlations         # Column correlations
│   ├── patterns             # Data patterns
│   └── outliers             # Outlier detection
├── enrich/                   # Tier 3: CC Enrichment
│   ├── behavioral           # Email/donation enrichment
│   ├── affinity             # Client affinity scores
│   └── propensity           # Donor propensity scores
├── voters/                   # Tier 4: i360
│   ├── search               # Advanced voter search
│   ├── {svid}               # Voter detail
│   ├── filters              # Dynamic filter options
│   ├── import               # Import to contacts
│   └── stats                # Voter database stats
└── chat/                     # Tier 4: i360 (requires voter data)
    ├── message              # Natural language query
    └── suggestions          # Query suggestions

Request/Response Standards

Request Headers:

X-Tenant-ID: mi20              # Tenant slug (or from JWT)
X-Subscription-Tier: 3         # Tier level (or from JWT)
Authorization: Bearer <jwt>    # Authentication

Response Envelope:

{
  "success": true,
  "data": { ... },
  "meta": {
    "tenant_id": "mi20",
    "tier": 3,
    "request_id": "uuid",
    "timing_ms": 123
  },
  "errors": []
}

Tier Enforcement Response (403):

{
  "success": false,
  "data": null,
  "meta": { "tenant_id": "mi20", "tier": 1 },
  "errors": [{
    "code": "TIER_REQUIRED",
    "message": "This endpoint requires Tier 3 (CC Enrichment)",
    "required_tier": 3,
    "current_tier": 1
  }]
}


Data Flow: Unified Architecture

flowchart TB
    subgraph "Client Request"
        REQ[API Request<br/>+ Tenant + Tier]
    end

    subgraph "API Gateway"
        AUTH[Auth Middleware<br/>JWT Validation]
        TIER[Tier Middleware<br/>Check Subscription]
        TENANT[Tenant Middleware<br/>Set Context]
    end

    subgraph "Route Handlers"
        CAMP[Campaign Routes<br/>Tier 1+]
        STAT[Stats Routes<br/>Tier 2+]
        ENRI[Enrich Routes<br/>Tier 3+]
        VOTE[Voter Routes<br/>Tier 4+]
    end

    subgraph "Data Layer"
        subgraph "Tenant-Scoped"
            POCKET[(pocket.db)]
            CAMPAIGN[(campaign.db<br/>WHERE tenant_id=?)]
        end
        subgraph "Shared (Read-Only)"
            I360[(i360.db)]
            CC[(CC Data)]
        end
    end

    REQ --> AUTH --> TIER --> TENANT
    TENANT --> CAMP & STAT & ENRI & VOTE
    CAMP --> CAMPAIGN
    STAT --> I360
    ENRI --> CC
    VOTE --> I360
    CAMP & STAT & ENRI & VOTE --> POCKET

Cross-Database Query Architecture

DuckDB ATTACH Pattern

flowchart LR
    subgraph "Connection"
        CONN[DuckDB Connection<br/>In-Memory]
    end

    subgraph "Attached Databases"
        A1[ATTACH pocket.db<br/>AS pocket]
        A2[ATTACH campaign.db<br/>AS campaign<br/>READ_ONLY]
        A3[ATTACH i360.db<br/>AS i360<br/>READ_ONLY]
    end

    subgraph "Cross-DB Query"
        Q[SELECT p.*, c.sources, v.score<br/>FROM pocket.person p<br/>LEFT JOIN campaign.rotated c<br/>  ON c.tenant_id='mi20'<br/>  AND c.field='email'<br/>  AND c.value=p.email<br/>LEFT JOIN i360.voters v<br/>  ON v.email=p.email]
    end

    CONN --> A1 & A2 & A3 --> Q

Example: Unified Enrichment Query

-- Attach all databases
ATTACH 'pocket.db' AS pocket;
ATTACH '/opt/.../campaign.db' AS campaign (READ_ONLY);
ATTACH '/opt/.../i360.db' AS i360 (READ_ONLY);
ATTACH '/opt/.../cc.db' AS cc (READ_ONLY);

-- Unified enrichment for a segment
SELECT
    p.id,
    p.first_name,
    p.last_name,
    p.email,

    -- Campaign enrichment (Tier 1)
    (SELECT COUNT(DISTINCT source_id)
     FROM campaign.rotated_data
     WHERE tenant_id = 'mi20'
       AND field_name = 'email'
       AND LOWER(value) = LOWER(p.email)) as campaign_sources,

    -- i360 enrichment (Tier 4)
    v.party,
    v.turnout_score,
    v.trump_support_score,

    -- CC enrichment (Tier 3)
    b.click_count,
    b.donation_total

FROM pocket.person p
LEFT JOIN i360.i360_voters v
    ON LOWER(v.email_mydata) = LOWER(p.email)
LEFT JOIN cc.behavioral_summary b
    ON LOWER(b.email) = LOWER(p.email)
WHERE p.id IN (SELECT person_id FROM pocket.segment_person WHERE segment_id = ?)

Implementation Plan

Phase 1: Foundation (Current Sprint)

gantt
    title Phase 1: Foundation
    dateFormat  YYYY-MM-DD
    section Schema
    Add tenant_id to campaign.db    :done, 2024-01-01, 1d
    Backfill existing data          :done, 2024-01-01, 1d
    section Docs
    Architecture document           :active, 2024-01-02, 1d
    section API
    Tier middleware design          :2024-01-03, 2d

Deliverables: - [x] Multi-tenant support (tenant_id = slug) - [x] Backfill existing data as 'mi20' - [x] Documentation (this document) - [x] Tier enforcement middleware design

Phase 2: API Consolidation ✅ COMPLETED

Tasks: 1. [x] Create unified route structure in cbmodels 2. [x] Add i360 voters endpoints to cbmodels 3. [x] Implement tier middleware 4. [x] Standardize response format

Implemented Route Files:

src/cbmodels/api/
├── campaign.py     # Tier 1: Campaign data (4 endpoints)
├── voters.py       # Tier 4: i360 voter data (5 endpoints)
├── routes.py       # Tier 2-3: Stats, Behavioral (existing)
├── middleware.py   # Tier enforcement middleware
├── response.py     # Unified response envelope
├── metering.py     # Usage tracking for billing
└── schemas.py      # Pydantic models

Environment Variables:

CBMODELS_ENFORCE_TIERS=true     # Enable tier access control
CBMODELS_DEFAULT_TIER=5         # Default tier if header missing
CBMODELS_UNIFIED_RESPONSE=true  # Wrap responses in envelope
CBMODELS_METERING=true          # Enable usage logging
CBMODELS_I360_DB=/path/to/i360.db  # i360 database path

Phase 3: cbapp Integration

Options:

Option Pros Cons
A: Proxy cbapp calls cbmodels API Extra hop, latency
B: Direct cbapp uses cbmodels as library Tight coupling
C: Merge Move all enrichment to cbmodels Larger cbmodels scope

Recommendation: Option A (Proxy) for clean separation - cbapp handles UI, auth, tenant management - cbmodels handles data, enrichment, analytics - Clear API boundary for pricing/metering

Phase 4: Metering & Billing ✅ COMPLETED

Implementation: src/cbmodels/api/metering.py

  • Logs all requests to JSONL file and DuckDB
  • Tracks: tenant_id, endpoint, tier, timing, status
  • New endpoint: GET /api/v1/usage for statistics
# Enable via environment variable
CBMODELS_METERING=true

# Optional custom paths
CBMODELS_METERING_DB=./data/metering.db
CBMODELS_METERING_LOG=./data/usage_logs.jsonl

Pricing Model Alignment

Tier Features & Pricing

Tier Monthly Features
Base (0) $0 Contacts, segments, basic import
Campaign+ (1) $49 + EAV rotation, dedup, campaign lookup
Analytics (2) $149 + Segment stats, correlations, patterns
CC Enrichment (3) $299 + Email engagement, donations, affinity
i360 (4) $499/state + Voter scores, demographics, history
Full Suite (5) $999 All features, priority support

Per-State i360 Pricing

State Voters Monthly
Michigan 7.5M $499
Ohio 8.0M $499
Pennsylvania 9.0M $599
Florida 14.5M $799
Additional Per request Contact sales

Security Considerations

Data Isolation

flowchart TB
    subgraph "Request Processing"
        REQ[Request] --> AUTH[Authenticate]
        AUTH --> TENANT[Extract Tenant]
        TENANT --> TIER[Check Tier]
    end

    subgraph "Query Execution"
        TIER --> QUERY[Build Query]
        QUERY --> INJECT[Inject tenant_id<br/>in WHERE clause]
        INJECT --> EXEC[Execute]
    end

    subgraph "Validation"
        EXEC --> VALIDATE[Validate Response<br/>No Cross-Tenant Data]
        VALIDATE --> RESP[Response]
    end

Query Injection Prevention

def build_campaign_query(tenant_id: str, email: str):
    """All queries MUST include tenant_id filter."""
    return conn.execute(
        """
        SELECT * FROM campaign.rotated_data
        WHERE tenant_id = ?          -- ALWAYS first
          AND field_name = 'email'
          AND LOWER(value) = LOWER(?)
        """,
        [tenant_id, email]  # Parameterized
    )

Migration Checklist

Immediate (Before Next Sprint) ✅

  • Document current architecture
  • Add tenant_id to campaign data
  • Backfill existing data
  • Review with team
  • Approve tier structure

Short-Term (Next 2 Sprints) ✅

  • Implement tier middleware (middleware.py)
  • Create unified route structure (campaign.py, voters.py)
  • Standardize response format (response.py)
  • Add request metering (metering.py)
  • Update cbapp to use new endpoints

Medium-Term (Next Quarter)

  • Add more i360 states
  • Implement usage dashboard (endpoint exists: /api/v1/usage)
  • Add billing integration
  • Performance optimization
  • Rate limiting per tier

Open Questions

  1. Tier Storage: Where to store tenant subscription tier?
  2. cbtenant manager.db?
  3. JWT claims?
  4. cbapp database?

  5. i360 State Management: How to handle multi-state subscriptions?

  6. Separate tiers per state?
  7. State list in subscription?

  8. CC Data Freshness: How often to sync from Snowflake?

  9. Daily batch?
  10. On-demand?

  11. Chat Integration: Should chat require Tier 4 (i360)?

  12. Yes: Chat queries voter data
  13. No: Could work with campaign data only

  14. Endpoint Versioning: How to handle breaking changes?

  15. URL versioning (/api/v2/)
  16. Header versioning (Accept-Version)

Appendix: Current vs Proposed Endpoint Mapping

Current (cbmodels) Current (cbapp) Proposed Tier
POST /segment/analyze - POST /stats/segment 2
GET /baseline/* - GET /stats/baseline/* 2
POST /behavioral/enrich - POST /enrich/behavioral 3
POST /behavioral/affinity - POST /enrich/affinity 3
POST /behavioral/propensity - POST /enrich/propensity 3
- GET /i360/search GET /voters/search 4
- GET /i360/voter/{svid} GET /voters/{svid} 4
- POST /i360/import POST /voters/import 4
- GET /i360/stats GET /voters/stats 4
- POST /cb-chat/message POST /chat/message 4
(CLI only) - GET /campaign/lookup 1
(CLI only) - GET /campaign/duplicates 1
(CLI only) - GET /campaign/sources 1
(CLI only) - POST /campaign/enrich 1