Behavioral Analysis & Enrichment Architecture¶
Strategic framework for on-the-fly segment analysis using i360 census data enriched with behavioral signals from email engagement, donations, and contact mapping.
Data Assets Overview¶
Core Identity Data (i360 / pocket.db)¶
| Table | Rows | Key Fields |
|---|---|---|
person |
81K | id, email, cell_phone, home_phone, county, state |
person_custom_field |
2M+ | person_id, custom_field_id, value |
custom_field |
50+ | state_voter_id (SVID), voter_key, party, demographics |
Behavioral Data (New Extracts)¶
| Table | Rows | Key Fields |
|---|---|---|
list_link_phone_email_map |
182M | CONTACT_NUMBER, EMAIL, cohort timestamps |
email_clickers |
41M | EMAIL, CAMPAIGN_ID, CLIENT_SHORTNAME, engagement flags |
campaigns |
73K | CAMPAIGN_ID, CLIENT_SHORTNAME, LIST_ID, CHANNEL |
donations |
3.2M | EMAIL, CONTACT_NUMBER, DONATION_DATE, STATE |
Linkage Keys¶
SVID (State Voter ID)
└── person_custom_field.value (where custom_field_id = 'state_voter_id')
└── person.id
├── person.email ──────────────┬─→ email_clickers.EMAIL
│ ├─→ donations.EMAIL
│ └─→ phone_email_map.EMAIL
└── person.cell_phone ─────────→ phone_email_map.CONTACT_NUMBER
└─→ donations.CONTACT_NUMBER
Analysis Scenarios¶
1. Segment Enrichment (Primary Use Case)¶
Input: List of SVIDs (State Voter IDs) Output: Enriched profile with behavioral scores
-- Given SVIDs, enrich with behavioral data
WITH segment AS (
SELECT pcf.person_id, pcf.value as svid
FROM person_custom_field pcf
JOIN custom_field cf ON pcf.custom_field_id = cf.id
WHERE cf.name = 'state_voter_id'
AND pcf.value IN ({{svid_list}})
),
person_emails AS (
SELECT s.svid, s.person_id, p.email
FROM segment s
JOIN person p ON s.person_id = p.id
WHERE p.email IS NOT NULL
),
engagement AS (
SELECT
pe.svid,
COUNT(DISTINCT ec.CAMPAIGN_ID) as campaigns_engaged,
SUM(ec.CLICK) as total_clicks,
SUM(ec.OPEN) as total_opens,
COUNT(DISTINCT ec.CLIENT_SHORTNAME) as clients_engaged,
MAX(ec.EVENT_DATE) as last_engagement
FROM person_emails pe
JOIN email_clickers ec ON LOWER(pe.email) = LOWER(ec.EMAIL)
GROUP BY pe.svid
),
donations AS (
SELECT
pe.svid,
COUNT(*) as donation_count,
MAX(d.DONATION_DATE) as last_donation
FROM person_emails pe
JOIN donations d ON LOWER(pe.email) = LOWER(d.EMAIL)
GROUP BY pe.svid
)
SELECT
s.svid,
COALESCE(e.campaigns_engaged, 0) as campaigns_engaged,
COALESCE(e.total_clicks, 0) as total_clicks,
COALESCE(e.total_opens, 0) as total_opens,
COALESCE(e.clients_engaged, 0) as clients_engaged,
COALESCE(d.donation_count, 0) as donation_count,
e.last_engagement,
d.last_donation,
CASE
WHEN d.donation_count > 0 THEN true
ELSE false
END as is_donor,
CASE
WHEN e.total_clicks > 0 THEN true
ELSE false
END as is_clicker
FROM segment s
LEFT JOIN engagement e ON s.svid = e.svid
LEFT JOIN donations d ON s.svid = d.svid;
Output Fields:
| Field | Description |
|-------|-------------|
| campaigns_engaged | Number of unique campaigns interacted with |
| total_clicks | Total email clicks across all campaigns |
| total_opens | Total email opens |
| clients_engaged | Number of distinct clients whose content they engaged with |
| donation_count | Number of donations on record |
| is_donor | Boolean flag for donor status |
| is_clicker | Boolean flag for email engagement |
| last_engagement | Most recent email interaction date |
| last_donation | Most recent donation date |
2. Client Affinity Scoring¶
Goal: Determine which clients' content a segment prefers.
-- Client affinity scores for a segment
WITH segment_emails AS (
-- Get emails for segment
SELECT DISTINCT LOWER(p.email) as email
FROM person_custom_field pcf
JOIN custom_field cf ON pcf.custom_field_id = cf.id
JOIN person p ON pcf.person_id = p.id
WHERE cf.name = 'state_voter_id'
AND pcf.value IN ({{svid_list}})
AND p.email IS NOT NULL
),
client_engagement AS (
SELECT
ec.CLIENT_SHORTNAME,
COUNT(*) as total_events,
SUM(ec.SEND) as sends,
SUM(ec.OPEN) as opens,
SUM(ec.CLICK) as clicks,
SUM(ec.UNSUBSCRIBE) as unsubs,
COUNT(DISTINCT ec.EMAIL) as unique_contacts
FROM email_clickers ec
JOIN segment_emails se ON LOWER(ec.EMAIL) = se.email
GROUP BY ec.CLIENT_SHORTNAME
)
SELECT
CLIENT_SHORTNAME,
unique_contacts,
total_events,
ROUND(100.0 * opens / NULLIF(sends, 0), 2) as open_rate,
ROUND(100.0 * clicks / NULLIF(opens, 0), 2) as click_through_rate,
ROUND(100.0 * unsubs / NULLIF(sends, 0), 4) as unsub_rate,
-- Affinity score: weighted engagement
ROUND(
(clicks * 3.0 + opens * 1.0 - unsubs * 5.0) / NULLIF(sends, 0) * 100,
2
) as affinity_score
FROM client_engagement
ORDER BY affinity_score DESC;
Use Cases: - Identify which conservative orgs resonate with a segment - Optimize send lists by client - Avoid fatiguing audiences with low-affinity content
3. Donor Propensity Model¶
Goal: Score contacts for donation likelihood based on behavioral signals.
-- Donor propensity features
WITH contact_features AS (
SELECT
p.id as person_id,
p.email,
-- Demographics from i360
p.state,
p.county,
-- Engagement metrics
COALESCE(ec_agg.total_clicks, 0) as total_clicks,
COALESCE(ec_agg.total_opens, 0) as total_opens,
COALESCE(ec_agg.campaigns_count, 0) as campaigns_count,
COALESCE(ec_agg.clients_count, 0) as clients_count,
COALESCE(ec_agg.days_since_last_click, 9999) as days_since_last_click,
-- Donation history
COALESCE(d_agg.donation_count, 0) as past_donations,
COALESCE(d_agg.days_since_last_donation, 9999) as days_since_last_donation
FROM person p
LEFT JOIN (
SELECT
EMAIL,
SUM(CLICK) as total_clicks,
SUM(OPEN) as total_opens,
COUNT(DISTINCT CAMPAIGN_ID) as campaigns_count,
COUNT(DISTINCT CLIENT_SHORTNAME) as clients_count,
DATEDIFF('day', MAX(EVENT_DATE)::DATE, CURRENT_DATE) as days_since_last_click
FROM email_clickers
GROUP BY EMAIL
) ec_agg ON LOWER(p.email) = LOWER(ec_agg.EMAIL)
LEFT JOIN (
SELECT
EMAIL,
COUNT(*) as donation_count,
DATEDIFF('day', MAX(DONATION_DATE)::DATE, CURRENT_DATE) as days_since_last_donation
FROM donations
GROUP BY EMAIL
) d_agg ON LOWER(p.email) = LOWER(d_agg.EMAIL)
)
SELECT
*,
-- Simple propensity score (can be replaced with ML model)
ROUND(
CASE
WHEN past_donations > 0 THEN 0.7 -- Previous donors have high base
ELSE 0.1
END
+ (LEAST(total_clicks, 50) / 50.0) * 0.15 -- Click engagement boost
+ (LEAST(campaigns_count, 20) / 20.0) * 0.10 -- Campaign diversity boost
- (LEAST(days_since_last_click, 365) / 365.0) * 0.05 -- Recency decay
, 3) as donor_propensity_score
FROM contact_features;
Propensity Score Components: | Factor | Weight | Rationale | |--------|--------|-----------| | Previous donor | +0.70 | Historical donors are most likely to donate again | | Click engagement | +0.15 | Clickers show investment in content | | Campaign diversity | +0.10 | Engaged across multiple campaigns = committed | | Recency decay | -0.05 | Older engagement is less predictive |
4. Engagement Health Dashboard¶
Goal: Monitor segment health metrics over time.
-- Engagement health for a segment
WITH segment_activity AS (
SELECT
DATE_TRUNC('week', ec.EVENT_DATE::DATE) as week,
COUNT(DISTINCT ec.EMAIL) as active_contacts,
SUM(ec.SEND) as sends,
SUM(ec.OPEN) as opens,
SUM(ec.CLICK) as clicks,
SUM(ec.BOUNCE) as bounces,
SUM(ec.COMPLAINT) as complaints,
SUM(ec.UNSUBSCRIBE) as unsubs
FROM email_clickers ec
WHERE ec.EMAIL IN (
SELECT LOWER(p.email)
FROM person_custom_field pcf
JOIN custom_field cf ON pcf.custom_field_id = cf.id
JOIN person p ON pcf.person_id = p.id
WHERE cf.name = 'state_voter_id'
AND pcf.value IN ({{svid_list}})
)
GROUP BY DATE_TRUNC('week', ec.EVENT_DATE::DATE)
)
SELECT
week,
active_contacts,
sends,
ROUND(100.0 * opens / NULLIF(sends, 0), 2) as open_rate,
ROUND(100.0 * clicks / NULLIF(opens, 0), 2) as ctr,
ROUND(100.0 * bounces / NULLIF(sends, 0), 2) as bounce_rate,
ROUND(100.0 * unsubs / NULLIF(sends, 0), 4) as unsub_rate,
ROUND(100.0 * complaints / NULLIF(sends, 0), 4) as complaint_rate
FROM segment_activity
ORDER BY week DESC;
Health Indicators: | Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| | Open Rate | >20% | 10-20% | <10% | | Click-Through Rate | >3% | 1-3% | <1% | | Bounce Rate | <2% | 2-5% | >5% | | Unsubscribe Rate | <0.1% | 0.1-0.5% | >0.5% | | Complaint Rate | <0.01% | 0.01-0.05% | >0.05% |
5. Lookalike Expansion¶
Goal: Find contacts similar to a seed segment based on behavior.
-- Find lookalikes based on client affinity pattern
WITH seed_pattern AS (
-- Get client engagement pattern for seed segment
SELECT
ec.CLIENT_SHORTNAME,
COUNT(*) as engagement_count,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () as engagement_share
FROM email_clickers ec
WHERE ec.EMAIL IN (
SELECT LOWER(p.email) FROM person p
JOIN person_custom_field pcf ON pcf.person_id = p.id
JOIN custom_field cf ON pcf.custom_field_id = cf.id
WHERE cf.name = 'state_voter_id'
AND pcf.value IN ({{seed_svid_list}})
)
GROUP BY ec.CLIENT_SHORTNAME
HAVING COUNT(*) > 10 -- Minimum engagement threshold
),
candidate_patterns AS (
-- Get patterns for all other contacts
SELECT
ec.EMAIL,
ec.CLIENT_SHORTNAME,
COUNT(*) as engagement_count
FROM email_clickers ec
WHERE ec.EMAIL NOT IN (
SELECT LOWER(p.email) FROM person p
JOIN person_custom_field pcf ON pcf.person_id = p.id
WHERE pcf.value IN ({{seed_svid_list}})
)
GROUP BY ec.EMAIL, ec.CLIENT_SHORTNAME
),
similarity_scores AS (
SELECT
cp.EMAIL,
SUM(
LEAST(cp.engagement_count, sp.engagement_count * 2) * sp.engagement_share
) as similarity_score
FROM candidate_patterns cp
JOIN seed_pattern sp ON cp.CLIENT_SHORTNAME = sp.CLIENT_SHORTNAME
GROUP BY cp.EMAIL
)
SELECT
ss.EMAIL,
ss.similarity_score,
p.id as person_id,
pcf.value as svid
FROM similarity_scores ss
JOIN person p ON LOWER(p.email) = LOWER(ss.EMAIL)
JOIN person_custom_field pcf ON pcf.person_id = p.id
JOIN custom_field cf ON pcf.custom_field_id = cf.id AND cf.name = 'state_voter_id'
ORDER BY ss.similarity_score DESC
LIMIT 1000;
6. Campaign Attribution¶
Goal: Track which campaigns drive donations.
-- Campaign to donation attribution
WITH campaign_contacts AS (
SELECT
ec.CAMPAIGN_ID,
c.CAMPAIGN_NAME,
c.CLIENT_SHORTNAME,
c.START_DATE,
ec.EMAIL,
ec.EVENT_DATE,
ec.CLICK
FROM email_clickers ec
JOIN campaigns c ON ec.CAMPAIGN_ID = c.CAMPAIGN_ID
WHERE ec.CLICK = 1
),
attributed_donations AS (
SELECT
cc.CAMPAIGN_ID,
cc.CAMPAIGN_NAME,
cc.CLIENT_SHORTNAME,
d.EMAIL,
d.DONATION_DATE,
cc.EVENT_DATE as click_date,
DATEDIFF('day', cc.EVENT_DATE::DATE, d.DONATION_DATE::DATE) as days_to_donation
FROM campaign_contacts cc
JOIN donations d ON LOWER(cc.EMAIL) = LOWER(d.EMAIL)
WHERE d.DONATION_DATE >= cc.EVENT_DATE -- Donation after click
AND DATEDIFF('day', cc.EVENT_DATE::DATE, d.DONATION_DATE::DATE) <= 30 -- 30-day window
)
SELECT
CLIENT_SHORTNAME,
CAMPAIGN_NAME,
COUNT(DISTINCT EMAIL) as donors_attributed,
AVG(days_to_donation) as avg_days_to_donate,
MIN(days_to_donation) as min_days,
MAX(days_to_donation) as max_days
FROM attributed_donations
GROUP BY CLIENT_SHORTNAME, CAMPAIGN_NAME
ORDER BY donors_attributed DESC
LIMIT 50;
API Endpoints (Proposed)¶
Segment Analysis¶
POST /api/v1/segment/analyze
{
"svids": ["2712345", "2712463", ...],
"include": {
"demographics": true,
"engagement": true,
"donations": true,
"client_affinity": true
},
"deviation_threshold": 0.1
}
Donor Propensity¶
POST /api/v1/segment/propensity
{
"svids": ["2712345", "2712463", ...],
"model": "donor",
"threshold": 0.5
}
Lookalike Expansion¶
POST /api/v1/segment/lookalike
{
"seed_svids": ["2712345", "2712463", ...],
"expansion_size": 1000,
"min_similarity": 0.3
}
Client Affinity¶
Implementation Priorities¶
Phase 1: Core Enrichment (MVP)¶
- Load behavioral data into DuckDB
- Create
cc_email_clickers,cc_donations,cc_campaignstables -
Build indexes on EMAIL, CAMPAIGN_ID
-
Extend segment analysis endpoint
- Add
is_donor,is_clickerflags - Add
total_clicks,total_opens,campaigns_engaged -
Add
last_engagement_date,last_donation_date -
Add client affinity endpoint
- Return top-N clients by engagement for segment
- Include open_rate, ctr, affinity_score per client
Phase 2: Propensity & Attribution¶
- Donor propensity scoring
- Feature engineering from behavioral data
-
Simple rule-based scoring (upgradeable to ML)
-
Campaign attribution
- Click-to-donation attribution
- Attribution window configuration
Phase 3: Advanced Analytics¶
- Lookalike modeling
- Client affinity pattern matching
-
Behavioral similarity scoring
-
Engagement health monitoring
- Time-series metrics
- Alerting on degradation
Additional Data Opportunities¶
Available for Future Extraction¶
| Table | Description | Potential Use |
|---|---|---|
EMAIL_OPENERS |
All email opens (not just clickers) | More complete engagement picture |
SMS_RESPONSES |
P2P text responses | Multi-channel engagement |
PHONE_CONTACTS |
Call center interactions | Voice channel data |
SURVEY_RESPONSES |
Poll/survey data | Sentiment and opinion signals |
EVENT_ATTENDANCE |
Rally/event participation | Offline engagement |
PETITION_SIGNERS |
Petition activity | Issue-level interest |
WEB_VISITS |
Website behavior | Digital engagement depth |
Cross-Channel Insights¶
EMAIL engagement + SMS response + Donation =
→ Multi-channel engagement score
→ Channel preference indicator
→ Best time/channel to reach
Temporal Patterns¶
Historical engagement patterns →
→ Seasonal responsiveness (election cycles)
→ Day-of-week preferences
→ Time-of-day optimization
Success Metrics¶
| Metric | Target | Measurement |
|---|---|---|
| Enrichment coverage | >80% | % of SVIDs with behavioral data |
| Donor prediction accuracy | >70% | Precision@K for donation targeting |
| Lookalike quality | >2x | Engagement lift vs random selection |
| Query latency | <2s | P95 response time for segment analysis |
Next Steps¶
- Load extracted parquet files into pocket.db (or separate analysis.db)
- Create linking views between person table and behavioral data
- Extend existing segment analysis API with behavioral enrichment
- Build propensity scoring function as SQL or Python UDF
- Test with known segments to validate linkage rates and insights