Skip to content

Contact Data Architecture

Created: 2026-01-01 Status: Documentation of current state + recommendations

Implementation Plan: See CONTACT-DATA-IMPLEMENTATION-PLAN.md for detailed implementation steps.

Key Decisions (2026-01-01)

  1. i360 IS the list for clients that have it - bootstrap from voter file
  2. User lists become header for clients without i360 - iteratively updated
  3. Human-readable PID - 6-character code using serial.py approach
  4. Simple matching for MVP - Name + Address + ZIP (0.95) or Name + City (0.70)
  5. Configurable threshold - Default 75%, stored in integration_setting

Overview

This document describes the three types of contact data in the CampaignBrain ecosystem, how they are stored, and how they connect (or should connect) together.

The Three Data Sources

1. i360 Voter File (Baseline)

Location: /opt/campaignbrain/shared/data/i360.db (symlinked per tenant)

Structure: | Field | Type | Description | |-------|------|-------------| | svid | BIGINT | State Voter ID - Primary key, unique per state | | first_name | VARCHAR | | | last_name | VARCHAR | | | city | VARCHAR | | | county | VARCHAR | | | state | VARCHAR | | | zip_code | BIGINT | | | cell_phone | VARCHAR | | | email_mydata | VARCHAR | From i360's MyData enrichment | | party | VARCHAR | Registered party | | gender | VARCHAR | | | birth_year | INTEGER | | | ethnicity | VARCHAR | | | 48 total columns | | Scores, voting history, demographics |

Volume: 8.2M records (Florida state file)

Characteristics: - Pre-cleaned, standardized data - SVID is the authoritative identifier within a state - Contains voter scores, turnout predictions, party affinity - Updated quarterly from i360

2. User-Loaded Lists (Campaign Data)

Location: pocket.dbperson table + person_custom_field

Import Flow:

CSV Upload → cbfiles → List Loader → Field Mapping → Duplicate Detection → person table

Structure (person table): | Field | Type | Description | |-------|------|-------------| | id | VARCHAR | UUID, randomly generated | | first_name | VARCHAR | | | last_name | VARCHAR | | | address1 | VARCHAR | | | city | VARCHAR | | | state | VARCHAR | | | zip | VARCHAR | | | cell_phone | VARCHAR | | | email | VARCHAR | | | import_source | VARCHAR | Origin identifier | | import_date | TIMESTAMP | When imported | | import_batch_id | VARCHAR | Groups related imports | | original_data | JSON | Full source record (audit) |

Custom Fields (person_custom_field): - State Voter ID (SVID) → stored here, not in person table - Voter Key - Party information - Legislative districts - 32 custom fields defined, 2M values

Duplicate Detection:

-- Current matching logic (list_loader.py:595-617)
WHERE first_name = ? AND last_name = ?
AND (
  (cell_phone IS NOT NULL AND cell_phone = ?)
  OR (email IS NOT NULL AND email = ?)
  OR (address1 IS NOT NULL AND address1 = ? AND zip = ?)
)

3. cbmodels Campaign Data (Behavioral)

Location: cbmodels/data/ directory per tenant

Structure: - sources/ - Ingested CSV files (donors, volunteers, events, etc.) - rotated.db - Normalized campaign data storage - providers/ - Email engagement from external providers

Key Concept: Uses email and phone as match keys, not person ID.

API Contract:

# Lookup by email/phone
POST /campaign/lookup
{
  "email": "john@example.com"
}

# Returns records from multiple sources
{
  "records": [
    {"source_name": "2024_Donors.csv", "fields": {...}},
    {"source_name": "Rally_Signups.csv", "fields": {...}}
  ]
}

Current ID Strategy

┌─────────────────────────────────────────────────────────────────────┐
│                         CURRENT STATE                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   i360.db                        pocket.db                          │
│   ┌──────────────┐              ┌──────────────────────┐            │
│   │ i360_voters  │              │ person               │            │
│   │              │   name+city  │                      │            │
│   │ SVID (PK) ───┼─────────────►│ id (UUID)            │            │
│   │ first_name   │   matching   │ first_name           │            │
│   │ last_name    │   (lossy)    │ last_name            │            │
│   │ city         │              │ city                 │            │
│   │ ...48 cols   │              │ ...                  │            │
│   └──────────────┘              │ import_source        │            │
│                                 │ original_data (JSON) │            │
│                                 └──────────┬───────────┘            │
│                                            │                        │
│                                            ▼                        │
│                                 ┌──────────────────────┐            │
│                                 │ person_custom_field  │            │
│                                 │ ┌──────────────────┐ │            │
│                                 │ │ State Voter ID   │ │ ◄── SVID  │
│                                 │ │ (as text value)  │ │    stored │
│                                 │ └──────────────────┘ │    here   │
│                                 └──────────────────────┘            │
│                                                                      │
│   cbmodels                                                          │
│   ┌──────────────┐              No direct link!                     │
│   │ sources/     │              Matches on email/phone              │
│   │ rotated.db   │──────────────────────────────────────►           │
│   │ providers/   │                                                  │
│   └──────────────┘                                                  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Problems with Current State

  1. No direct SVID link in person table
  2. SVID stored as custom field (text), not indexed foreign key
  3. Cannot efficiently join person ↔ i360_voters

  4. Name+city matching is lossy

  5. "John Smith" in "Miami" could match multiple voters
  6. No fuzzy matching, soundex, or date-of-birth verification

  7. No global Person ID (PID) across tenants

  8. Each tenant has isolated UUIDs
  9. Cross-tenant deduplication impossible

  10. cbmodels uses email/phone, not SVID

  11. Behavioral data disconnected from voter file
  12. Cannot segment by i360 scores + behavioral data together

  13. Segments query person table, not i360 directly

  14. Loses access to 48 columns of voter data
  15. cbmodels segment analysis requires SVID → custom field lookup

Option A: SVID as First-Class Citizen

Add svid column to person table:

ALTER TABLE person ADD COLUMN svid BIGINT;
CREATE INDEX idx_person_svid ON person(svid);

Pros: - Direct join to i360_voters - Fast segment analysis - cbmodels can work directly with person table

Cons: - Not all contacts have SVID (non-voters, out-of-state) - Schema change requires migration

Create a dedicated link table:

CREATE TABLE person_voter (
    person_id VARCHAR PRIMARY KEY,
    svid BIGINT NOT NULL,
    match_type VARCHAR,  -- 'exact', 'fuzzy', 'manual'
    match_score FLOAT,   -- confidence 0-1
    matched_at TIMESTAMP,
    matched_by VARCHAR,  -- 'import', 'user:{id}', 'system'
    UNIQUE(svid)
);

CREATE INDEX idx_person_voter_svid ON person_voter(svid);

Matching types: | Type | Description | Score | |------|-------------|-------| | exact | SVID provided in import | 1.0 | | name_address | first + last + address + zip | 0.9 | | name_city | first + last + city (current) | 0.7 | | fuzzy | Soundex + DOB + address | 0.5-0.8 | | manual | User-verified match | 1.0 |

Pros: - Preserves unmatched persons (non-voters) - Audit trail for match quality - Multiple match strategies can coexist - No person table schema change

Cons: - Extra join required - Link table must be maintained

Option C: i360 as Header File

For clients WITH i360 data, use i360_voters as the base:

┌──────────────────────────────────────────────────────────────────────┐
│                    i360 AS HEADER FILE                               │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   ┌───────────────────┐                                              │
│   │   i360_voters     │  ◄── Source of truth for voter data         │
│   │   (8.2M records)  │                                              │
│   │   SVID = PK       │                                              │
│   └─────────┬─────────┘                                              │
│             │                                                         │
│             │ SVID                                                    │
│             ▼                                                         │
│   ┌───────────────────┐                                              │
│   │   person          │  ◄── Campaign interactions layer             │
│   │   id = UUID       │      (events, whip, communications)          │
│   │   svid = FK       │  ◄── Direct link to i360                     │
│   └─────────┬─────────┘                                              │
│             │                                                         │
│             ├──────────────────────────┐                             │
│             ▼                          ▼                             │
│   ┌─────────────────┐        ┌─────────────────┐                     │
│   │ person_tag      │        │ communication   │                     │
│   │ event_reg       │        │ whip_status     │                     │
│   └─────────────────┘        └─────────────────┘                     │
│                                                                       │
│   For clients WITHOUT i360:                                          │
│   ┌───────────────────┐                                              │
│   │   User-provided   │  ──►  person (svid = NULL)                   │
│   │   list import     │       Still works, just no i360 join         │
│   └───────────────────┘                                              │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

Implementation Roadmap

Phase 1: Schema Enhancement

  1. Add svid BIGINT column to person table
  2. Create index on svid
  3. Migrate existing custom field values to new column
  4. Keep custom field for backward compatibility

Phase 2: Import Enhancement

  1. When importing from i360, set svid directly
  2. When importing user lists with SVID column, set directly
  3. When importing without SVID, attempt match to i360

Phase 3: Matching Service

  1. Create matching endpoint to link persons to i360
  2. Support batch matching for existing records
  3. Add match quality scores and audit trail

Phase 4: Query Integration

  1. Update segment queries to join i360_voters
  2. Update cbmodels to use svid column directly
  3. Enable cross-database queries (person + i360 + cbmodels)

cbmodels Integration

cbmodels currently uses SVID for segment analysis:

# Current flow (cbmodels/CLAUDE.md)
1. Load model.json at startup (baseline)
2. Resolve SVIDs to person_ids via person_custom_field  # <-- Slow
3. Compute segment stats with filtered queries
4. Compare against baseline

With svid column in person table:

# Improved flow
1. Load model.json at startup (baseline)
2. Join directly: person.svid = segment.svid  # <-- Fast
3. Compute segment stats
4. Compare against baseline

Data Quality Considerations

Phone Normalization

Current state: No normalization before matching

"555-123-4567" ≠ "5551234567" ≠ "+15551234567"

Recommendation: Normalize all phones to E.164 format

Name Matching

Current state: Exact match only

"Robert" ≠ "Bob" ≠ "Rob"
"McDonald" ≠ "Mcdonald"

Recommendation: Add nickname dictionary + case-insensitive matching

Address Matching

Current state: Exact string match

"123 Main St" ≠ "123 Main Street"
"Apt 1" ≠ "Unit 1" ≠ "#1"

Recommendation: Use USPS address standardization

Summary

Data Source ID Purpose Connected?
i360 SVID Voter baseline Via custom field (slow)
User Lists UUID Campaign contacts Primary
cbmodels email/phone Behavioral No direct link

Key Recommendation: Add svid column to person table to unify data sources.

Terminology Glossary

Term Definition
Audience UI term for the full contact list (person table)
Segment A filtered subset of persons, created via SQL query or manual selection
SVID State Voter ID - unique identifier from i360/state voter file
PID Person ID - our internal UUID (currently not globally unique across tenants)
Tag A label attached to persons for categorization
Custom Field Dynamic field for storing additional person attributes
i360 Voter file data provider (8.2M FL records)
cbmodels Behavioral data analysis service

Segment vs Audience

┌─────────────────────────────────────────────────┐
│                  AUDIENCE                        │
│        (All persons in person table)             │
│                                                  │
│    ┌────────────────────────────────────────┐   │
│    │           SEGMENT A                     │   │
│    │  (Filtered by county='Miami-Dade')      │   │
│    │                                         │   │
│    │    ┌───────────────────────────────┐   │   │
│    │    │      SEGMENT B                 │   │   │
│    │    │ (Further filtered by whip=yes) │   │   │
│    │    └───────────────────────────────┘   │   │
│    │                                         │   │
│    └────────────────────────────────────────┘   │
│                                                  │
└─────────────────────────────────────────────────┘

Segment Types: | Type | Description | Use Case | |------|-------------|----------| | Dynamic | SQL query re-executed on demand | "All Miami-Dade voters" | | Static | Fixed list of person IDs | "GOTV Team 1" | | i360 Segment | Created from i360 query | "High turnout Republicans" |

Step 1: Add SVID to Person Table

-- Migration script
ALTER TABLE person ADD COLUMN svid BIGINT;
CREATE INDEX idx_person_svid ON person(svid);

-- Migrate existing custom field values
UPDATE person p
SET svid = CAST(pcf.value AS BIGINT)
FROM person_custom_field pcf
JOIN custom_field cf ON pcf.custom_field_id = cf.id
WHERE cf.name = 'State Voter ID'
  AND pcf.person_id = p.id
  AND pcf.value IS NOT NULL
  AND TRY_CAST(pcf.value AS BIGINT) IS NOT NULL;
CREATE TABLE person_voter_match (
    id VARCHAR PRIMARY KEY,
    person_id VARCHAR NOT NULL REFERENCES person(id),
    svid BIGINT NOT NULL,
    match_type VARCHAR NOT NULL,  -- 'exact', 'name_address', 'fuzzy', 'manual'
    match_score FLOAT NOT NULL,   -- 0.0 to 1.0
    match_details JSON,           -- Fields used for matching
    matched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    matched_by VARCHAR,           -- 'system', 'import', user_id
    verified_by VARCHAR,          -- User who verified the match
    verified_at TIMESTAMP,
    UNIQUE(person_id),
    UNIQUE(svid)
);

CREATE INDEX idx_pvm_svid ON person_voter_match(svid);
CREATE INDEX idx_pvm_match_type ON person_voter_match(match_type);

Step 3: Matching Algorithm

def match_person_to_voter(person: Person) -> tuple[int | None, float, str]:
    """
    Attempt to match a person record to i360 voter file.

    Returns: (svid, score, match_type)
    """
    # Priority 1: SVID already provided
    if person.svid:
        return (person.svid, 1.0, 'exact')

    # Priority 2: Name + Address + ZIP
    voters = query_i360(
        first_name=person.first_name,
        last_name=person.last_name,
        address=person.address1,
        zip=person.zip
    )
    if len(voters) == 1:
        return (voters[0].svid, 0.95, 'name_address')

    # Priority 3: Name + City (current behavior)
    voters = query_i360(
        first_name=person.first_name,
        last_name=person.last_name,
        city=person.city
    )
    if len(voters) == 1:
        return (voters[0].svid, 0.7, 'name_city')

    # Priority 4: Fuzzy matching
    candidates = fuzzy_match_voters(person)
    if candidates:
        best = max(candidates, key=lambda x: x.score)
        if best.score >= 0.8:
            return (best.svid, best.score, 'fuzzy')

    # No match
    return (None, 0.0, 'none')

Step 4: Import-Time Matching

async def import_person(data: PersonCreate, user_id: str):
    # Create person record
    person = create_person(data, user_id)

    # Attempt voter match
    svid, score, match_type = match_person_to_voter(person)

    if svid:
        # Update person with SVID
        update_person_svid(person.id, svid)

        # Log match for audit
        create_match_record(
            person_id=person.id,
            svid=svid,
            match_type=match_type,
            match_score=score,
            matched_by='import'
        )

    return person
File Purpose
src/api/routes/i360.py i360 import and query endpoints
src/api/routes/list_loader.py User list import with duplicate detection
src/api/routes/segments.py Segment management
cbmodels/src/cbmodels/api/segment.py Segment analysis using SVID
scripts/create_schema.py Database schema definition

Open Questions

  1. Multi-state support: Current i360 data is FL only. How to handle multi-state campaigns?
  2. SVID format: FL uses numeric SVIDs. Other states may use different formats.
  3. Match confidence threshold: What score is "good enough" for auto-matching?
  4. Manual override UI: How should users verify/correct matches?
  5. Unmatched contacts: How to handle persons who can't be matched to i360?