Contact Data Architecture¶
Created: 2026-01-01 Status: Documentation of current state + recommendations
Implementation Plan: See CONTACT-DATA-IMPLEMENTATION-PLAN.md for detailed implementation steps.
Key Decisions (2026-01-01)¶
- i360 IS the list for clients that have it - bootstrap from voter file
- User lists become header for clients without i360 - iteratively updated
- Human-readable PID - 6-character code using serial.py approach
- Simple matching for MVP - Name + Address + ZIP (0.95) or Name + City (0.70)
- Configurable threshold - Default 75%, stored in integration_setting
Overview¶
This document describes the three types of contact data in the CampaignBrain ecosystem, how they are stored, and how they connect (or should connect) together.
The Three Data Sources¶
1. i360 Voter File (Baseline)¶
Location: /opt/campaignbrain/shared/data/i360.db (symlinked per tenant)
Structure:
| Field | Type | Description |
|-------|------|-------------|
| svid | BIGINT | State Voter ID - Primary key, unique per state |
| first_name | VARCHAR | |
| last_name | VARCHAR | |
| city | VARCHAR | |
| county | VARCHAR | |
| state | VARCHAR | |
| zip_code | BIGINT | |
| cell_phone | VARCHAR | |
| email_mydata | VARCHAR | From i360's MyData enrichment |
| party | VARCHAR | Registered party |
| gender | VARCHAR | |
| birth_year | INTEGER | |
| ethnicity | VARCHAR | |
| 48 total columns | | Scores, voting history, demographics |
Volume: 8.2M records (Florida state file)
Characteristics: - Pre-cleaned, standardized data - SVID is the authoritative identifier within a state - Contains voter scores, turnout predictions, party affinity - Updated quarterly from i360
2. User-Loaded Lists (Campaign Data)¶
Location: pocket.db → person table + person_custom_field
Import Flow:
Structure (person table):
| Field | Type | Description |
|-------|------|-------------|
| id | VARCHAR | UUID, randomly generated |
| first_name | VARCHAR | |
| last_name | VARCHAR | |
| address1 | VARCHAR | |
| city | VARCHAR | |
| state | VARCHAR | |
| zip | VARCHAR | |
| cell_phone | VARCHAR | |
| email | VARCHAR | |
| import_source | VARCHAR | Origin identifier |
| import_date | TIMESTAMP | When imported |
| import_batch_id | VARCHAR | Groups related imports |
| original_data | JSON | Full source record (audit) |
Custom Fields (person_custom_field): - State Voter ID (SVID) → stored here, not in person table - Voter Key - Party information - Legislative districts - 32 custom fields defined, 2M values
Duplicate Detection:
-- Current matching logic (list_loader.py:595-617)
WHERE first_name = ? AND last_name = ?
AND (
(cell_phone IS NOT NULL AND cell_phone = ?)
OR (email IS NOT NULL AND email = ?)
OR (address1 IS NOT NULL AND address1 = ? AND zip = ?)
)
3. cbmodels Campaign Data (Behavioral)¶
Location: cbmodels/data/ directory per tenant
Structure:
- sources/ - Ingested CSV files (donors, volunteers, events, etc.)
- rotated.db - Normalized campaign data storage
- providers/ - Email engagement from external providers
Key Concept: Uses email and phone as match keys, not person ID.
API Contract:
# Lookup by email/phone
POST /campaign/lookup
{
"email": "john@example.com"
}
# Returns records from multiple sources
{
"records": [
{"source_name": "2024_Donors.csv", "fields": {...}},
{"source_name": "Rally_Signups.csv", "fields": {...}}
]
}
Current ID Strategy¶
┌─────────────────────────────────────────────────────────────────────┐
│ CURRENT STATE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ i360.db pocket.db │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ i360_voters │ │ person │ │
│ │ │ name+city │ │ │
│ │ SVID (PK) ───┼─────────────►│ id (UUID) │ │
│ │ first_name │ matching │ first_name │ │
│ │ last_name │ (lossy) │ last_name │ │
│ │ city │ │ city │ │
│ │ ...48 cols │ │ ... │ │
│ └──────────────┘ │ import_source │ │
│ │ original_data (JSON) │ │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ person_custom_field │ │
│ │ ┌──────────────────┐ │ │
│ │ │ State Voter ID │ │ ◄── SVID │
│ │ │ (as text value) │ │ stored │
│ │ └──────────────────┘ │ here │
│ └──────────────────────┘ │
│ │
│ cbmodels │
│ ┌──────────────┐ No direct link! │
│ │ sources/ │ Matches on email/phone │
│ │ rotated.db │──────────────────────────────────────► │
│ │ providers/ │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Problems with Current State¶
- No direct SVID link in person table
- SVID stored as custom field (text), not indexed foreign key
-
Cannot efficiently join person ↔ i360_voters
-
Name+city matching is lossy
- "John Smith" in "Miami" could match multiple voters
-
No fuzzy matching, soundex, or date-of-birth verification
-
No global Person ID (PID) across tenants
- Each tenant has isolated UUIDs
-
Cross-tenant deduplication impossible
-
cbmodels uses email/phone, not SVID
- Behavioral data disconnected from voter file
-
Cannot segment by i360 scores + behavioral data together
-
Segments query person table, not i360 directly
- Loses access to 48 columns of voter data
- cbmodels segment analysis requires SVID → custom field lookup
Recommended Architecture¶
Option A: SVID as First-Class Citizen¶
Add svid column to person table:
Pros: - Direct join to i360_voters - Fast segment analysis - cbmodels can work directly with person table
Cons: - Not all contacts have SVID (non-voters, out-of-state) - Schema change requires migration
Option B: Person-i360 Link Table (Recommended)¶
Create a dedicated link table:
CREATE TABLE person_voter (
person_id VARCHAR PRIMARY KEY,
svid BIGINT NOT NULL,
match_type VARCHAR, -- 'exact', 'fuzzy', 'manual'
match_score FLOAT, -- confidence 0-1
matched_at TIMESTAMP,
matched_by VARCHAR, -- 'import', 'user:{id}', 'system'
UNIQUE(svid)
);
CREATE INDEX idx_person_voter_svid ON person_voter(svid);
Matching types:
| Type | Description | Score |
|------|-------------|-------|
| exact | SVID provided in import | 1.0 |
| name_address | first + last + address + zip | 0.9 |
| name_city | first + last + city (current) | 0.7 |
| fuzzy | Soundex + DOB + address | 0.5-0.8 |
| manual | User-verified match | 1.0 |
Pros: - Preserves unmatched persons (non-voters) - Audit trail for match quality - Multiple match strategies can coexist - No person table schema change
Cons: - Extra join required - Link table must be maintained
Option C: i360 as Header File¶
For clients WITH i360 data, use i360_voters as the base:
┌──────────────────────────────────────────────────────────────────────┐
│ i360 AS HEADER FILE │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────┐ │
│ │ i360_voters │ ◄── Source of truth for voter data │
│ │ (8.2M records) │ │
│ │ SVID = PK │ │
│ └─────────┬─────────┘ │
│ │ │
│ │ SVID │
│ ▼ │
│ ┌───────────────────┐ │
│ │ person │ ◄── Campaign interactions layer │
│ │ id = UUID │ (events, whip, communications) │
│ │ svid = FK │ ◄── Direct link to i360 │
│ └─────────┬─────────┘ │
│ │ │
│ ├──────────────────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ person_tag │ │ communication │ │
│ │ event_reg │ │ whip_status │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ For clients WITHOUT i360: │
│ ┌───────────────────┐ │
│ │ User-provided │ ──► person (svid = NULL) │
│ │ list import │ Still works, just no i360 join │
│ └───────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
Implementation Roadmap¶
Phase 1: Schema Enhancement¶
- Add
svid BIGINTcolumn to person table - Create index on svid
- Migrate existing custom field values to new column
- Keep custom field for backward compatibility
Phase 2: Import Enhancement¶
- When importing from i360, set svid directly
- When importing user lists with SVID column, set directly
- When importing without SVID, attempt match to i360
Phase 3: Matching Service¶
- Create matching endpoint to link persons to i360
- Support batch matching for existing records
- Add match quality scores and audit trail
Phase 4: Query Integration¶
- Update segment queries to join i360_voters
- Update cbmodels to use svid column directly
- Enable cross-database queries (person + i360 + cbmodels)
cbmodels Integration¶
cbmodels currently uses SVID for segment analysis:
# Current flow (cbmodels/CLAUDE.md)
1. Load model.json at startup (baseline)
2. Resolve SVIDs to person_ids via person_custom_field # <-- Slow
3. Compute segment stats with filtered queries
4. Compare against baseline
With svid column in person table:
# Improved flow
1. Load model.json at startup (baseline)
2. Join directly: person.svid = segment.svid # <-- Fast
3. Compute segment stats
4. Compare against baseline
Data Quality Considerations¶
Phone Normalization¶
Current state: No normalization before matching
Recommendation: Normalize all phones to E.164 format
Name Matching¶
Current state: Exact match only
Recommendation: Add nickname dictionary + case-insensitive matching
Address Matching¶
Current state: Exact string match
Recommendation: Use USPS address standardization
Summary¶
| Data Source | ID | Purpose | Connected? |
|---|---|---|---|
| i360 | SVID | Voter baseline | Via custom field (slow) |
| User Lists | UUID | Campaign contacts | Primary |
| cbmodels | email/phone | Behavioral | No direct link |
Key Recommendation: Add svid column to person table to unify data sources.
Terminology Glossary¶
| Term | Definition |
|---|---|
| Audience | UI term for the full contact list (person table) |
| Segment | A filtered subset of persons, created via SQL query or manual selection |
| SVID | State Voter ID - unique identifier from i360/state voter file |
| PID | Person ID - our internal UUID (currently not globally unique across tenants) |
| Tag | A label attached to persons for categorization |
| Custom Field | Dynamic field for storing additional person attributes |
| i360 | Voter file data provider (8.2M FL records) |
| cbmodels | Behavioral data analysis service |
Segment vs Audience¶
┌─────────────────────────────────────────────────┐
│ AUDIENCE │
│ (All persons in person table) │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ SEGMENT A │ │
│ │ (Filtered by county='Miami-Dade') │ │
│ │ │ │
│ │ ┌───────────────────────────────┐ │ │
│ │ │ SEGMENT B │ │ │
│ │ │ (Further filtered by whip=yes) │ │ │
│ │ └───────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────┘
Segment Types: | Type | Description | Use Case | |------|-------------|----------| | Dynamic | SQL query re-executed on demand | "All Miami-Dade voters" | | Static | Fixed list of person IDs | "GOTV Team 1" | | i360 Segment | Created from i360 query | "High turnout Republicans" |
ID Linking Strategy (Recommended)¶
Step 1: Add SVID to Person Table¶
-- Migration script
ALTER TABLE person ADD COLUMN svid BIGINT;
CREATE INDEX idx_person_svid ON person(svid);
-- Migrate existing custom field values
UPDATE person p
SET svid = CAST(pcf.value AS BIGINT)
FROM person_custom_field pcf
JOIN custom_field cf ON pcf.custom_field_id = cf.id
WHERE cf.name = 'State Voter ID'
AND pcf.person_id = p.id
AND pcf.value IS NOT NULL
AND TRY_CAST(pcf.value AS BIGINT) IS NOT NULL;
Step 2: Link Table for Match Tracking¶
CREATE TABLE person_voter_match (
id VARCHAR PRIMARY KEY,
person_id VARCHAR NOT NULL REFERENCES person(id),
svid BIGINT NOT NULL,
match_type VARCHAR NOT NULL, -- 'exact', 'name_address', 'fuzzy', 'manual'
match_score FLOAT NOT NULL, -- 0.0 to 1.0
match_details JSON, -- Fields used for matching
matched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
matched_by VARCHAR, -- 'system', 'import', user_id
verified_by VARCHAR, -- User who verified the match
verified_at TIMESTAMP,
UNIQUE(person_id),
UNIQUE(svid)
);
CREATE INDEX idx_pvm_svid ON person_voter_match(svid);
CREATE INDEX idx_pvm_match_type ON person_voter_match(match_type);
Step 3: Matching Algorithm¶
def match_person_to_voter(person: Person) -> tuple[int | None, float, str]:
"""
Attempt to match a person record to i360 voter file.
Returns: (svid, score, match_type)
"""
# Priority 1: SVID already provided
if person.svid:
return (person.svid, 1.0, 'exact')
# Priority 2: Name + Address + ZIP
voters = query_i360(
first_name=person.first_name,
last_name=person.last_name,
address=person.address1,
zip=person.zip
)
if len(voters) == 1:
return (voters[0].svid, 0.95, 'name_address')
# Priority 3: Name + City (current behavior)
voters = query_i360(
first_name=person.first_name,
last_name=person.last_name,
city=person.city
)
if len(voters) == 1:
return (voters[0].svid, 0.7, 'name_city')
# Priority 4: Fuzzy matching
candidates = fuzzy_match_voters(person)
if candidates:
best = max(candidates, key=lambda x: x.score)
if best.score >= 0.8:
return (best.svid, best.score, 'fuzzy')
# No match
return (None, 0.0, 'none')
Step 4: Import-Time Matching¶
async def import_person(data: PersonCreate, user_id: str):
# Create person record
person = create_person(data, user_id)
# Attempt voter match
svid, score, match_type = match_person_to_voter(person)
if svid:
# Update person with SVID
update_person_svid(person.id, svid)
# Log match for audit
create_match_record(
person_id=person.id,
svid=svid,
match_type=match_type,
match_score=score,
matched_by='import'
)
return person
Related Files¶
| File | Purpose |
|---|---|
src/api/routes/i360.py |
i360 import and query endpoints |
src/api/routes/list_loader.py |
User list import with duplicate detection |
src/api/routes/segments.py |
Segment management |
cbmodels/src/cbmodels/api/segment.py |
Segment analysis using SVID |
scripts/create_schema.py |
Database schema definition |
Open Questions¶
- Multi-state support: Current i360 data is FL only. How to handle multi-state campaigns?
- SVID format: FL uses numeric SVIDs. Other states may use different formats.
- Match confidence threshold: What score is "good enough" for auto-matching?
- Manual override UI: How should users verify/correct matches?
- Unmatched contacts: How to handle persons who can't be matched to i360?