Skip to content

Behavioral Analysis & Enrichment Architecture

Strategic framework for on-the-fly segment analysis using i360 census data enriched with behavioral signals from email engagement, donations, and contact mapping.


Data Assets Overview

Core Identity Data (i360 / pocket.db)

Table Rows Key Fields
person 81K id, email, cell_phone, home_phone, county, state
person_custom_field 2M+ person_id, custom_field_id, value
custom_field 50+ state_voter_id (SVID), voter_key, party, demographics

Behavioral Data (New Extracts)

Table Rows Key Fields
list_link_phone_email_map 182M CONTACT_NUMBER, EMAIL, cohort timestamps
email_clickers 41M EMAIL, CAMPAIGN_ID, CLIENT_SHORTNAME, engagement flags
campaigns 73K CAMPAIGN_ID, CLIENT_SHORTNAME, LIST_ID, CHANNEL
donations 3.2M EMAIL, CONTACT_NUMBER, DONATION_DATE, STATE

Linkage Keys

SVID (State Voter ID)
    └── person_custom_field.value (where custom_field_id = 'state_voter_id')
            └── person.id
                    ├── person.email ──────────────┬─→ email_clickers.EMAIL
                    │                              ├─→ donations.EMAIL
                    │                              └─→ phone_email_map.EMAIL
                    └── person.cell_phone ─────────→ phone_email_map.CONTACT_NUMBER
                                                   └─→ donations.CONTACT_NUMBER

Analysis Scenarios

1. Segment Enrichment (Primary Use Case)

Input: List of SVIDs (State Voter IDs) Output: Enriched profile with behavioral scores

-- Given SVIDs, enrich with behavioral data
WITH segment AS (
    SELECT pcf.person_id, pcf.value as svid
    FROM person_custom_field pcf
    JOIN custom_field cf ON pcf.custom_field_id = cf.id
    WHERE cf.name = 'state_voter_id'
    AND pcf.value IN ({{svid_list}})
),
person_emails AS (
    SELECT s.svid, s.person_id, p.email
    FROM segment s
    JOIN person p ON s.person_id = p.id
    WHERE p.email IS NOT NULL
),
engagement AS (
    SELECT
        pe.svid,
        COUNT(DISTINCT ec.CAMPAIGN_ID) as campaigns_engaged,
        SUM(ec.CLICK) as total_clicks,
        SUM(ec.OPEN) as total_opens,
        COUNT(DISTINCT ec.CLIENT_SHORTNAME) as clients_engaged,
        MAX(ec.EVENT_DATE) as last_engagement
    FROM person_emails pe
    JOIN email_clickers ec ON LOWER(pe.email) = LOWER(ec.EMAIL)
    GROUP BY pe.svid
),
donations AS (
    SELECT
        pe.svid,
        COUNT(*) as donation_count,
        MAX(d.DONATION_DATE) as last_donation
    FROM person_emails pe
    JOIN donations d ON LOWER(pe.email) = LOWER(d.EMAIL)
    GROUP BY pe.svid
)
SELECT
    s.svid,
    COALESCE(e.campaigns_engaged, 0) as campaigns_engaged,
    COALESCE(e.total_clicks, 0) as total_clicks,
    COALESCE(e.total_opens, 0) as total_opens,
    COALESCE(e.clients_engaged, 0) as clients_engaged,
    COALESCE(d.donation_count, 0) as donation_count,
    e.last_engagement,
    d.last_donation,
    CASE
        WHEN d.donation_count > 0 THEN true
        ELSE false
    END as is_donor,
    CASE
        WHEN e.total_clicks > 0 THEN true
        ELSE false
    END as is_clicker
FROM segment s
LEFT JOIN engagement e ON s.svid = e.svid
LEFT JOIN donations d ON s.svid = d.svid;

Output Fields: | Field | Description | |-------|-------------| | campaigns_engaged | Number of unique campaigns interacted with | | total_clicks | Total email clicks across all campaigns | | total_opens | Total email opens | | clients_engaged | Number of distinct clients whose content they engaged with | | donation_count | Number of donations on record | | is_donor | Boolean flag for donor status | | is_clicker | Boolean flag for email engagement | | last_engagement | Most recent email interaction date | | last_donation | Most recent donation date |


2. Client Affinity Scoring

Goal: Determine which clients' content a segment prefers.

-- Client affinity scores for a segment
WITH segment_emails AS (
    -- Get emails for segment
    SELECT DISTINCT LOWER(p.email) as email
    FROM person_custom_field pcf
    JOIN custom_field cf ON pcf.custom_field_id = cf.id
    JOIN person p ON pcf.person_id = p.id
    WHERE cf.name = 'state_voter_id'
    AND pcf.value IN ({{svid_list}})
    AND p.email IS NOT NULL
),
client_engagement AS (
    SELECT
        ec.CLIENT_SHORTNAME,
        COUNT(*) as total_events,
        SUM(ec.SEND) as sends,
        SUM(ec.OPEN) as opens,
        SUM(ec.CLICK) as clicks,
        SUM(ec.UNSUBSCRIBE) as unsubs,
        COUNT(DISTINCT ec.EMAIL) as unique_contacts
    FROM email_clickers ec
    JOIN segment_emails se ON LOWER(ec.EMAIL) = se.email
    GROUP BY ec.CLIENT_SHORTNAME
)
SELECT
    CLIENT_SHORTNAME,
    unique_contacts,
    total_events,
    ROUND(100.0 * opens / NULLIF(sends, 0), 2) as open_rate,
    ROUND(100.0 * clicks / NULLIF(opens, 0), 2) as click_through_rate,
    ROUND(100.0 * unsubs / NULLIF(sends, 0), 4) as unsub_rate,
    -- Affinity score: weighted engagement
    ROUND(
        (clicks * 3.0 + opens * 1.0 - unsubs * 5.0) / NULLIF(sends, 0) * 100,
        2
    ) as affinity_score
FROM client_engagement
ORDER BY affinity_score DESC;

Use Cases: - Identify which conservative orgs resonate with a segment - Optimize send lists by client - Avoid fatiguing audiences with low-affinity content


3. Donor Propensity Model

Goal: Score contacts for donation likelihood based on behavioral signals.

-- Donor propensity features
WITH contact_features AS (
    SELECT
        p.id as person_id,
        p.email,
        -- Demographics from i360
        p.state,
        p.county,
        -- Engagement metrics
        COALESCE(ec_agg.total_clicks, 0) as total_clicks,
        COALESCE(ec_agg.total_opens, 0) as total_opens,
        COALESCE(ec_agg.campaigns_count, 0) as campaigns_count,
        COALESCE(ec_agg.clients_count, 0) as clients_count,
        COALESCE(ec_agg.days_since_last_click, 9999) as days_since_last_click,
        -- Donation history
        COALESCE(d_agg.donation_count, 0) as past_donations,
        COALESCE(d_agg.days_since_last_donation, 9999) as days_since_last_donation
    FROM person p
    LEFT JOIN (
        SELECT
            EMAIL,
            SUM(CLICK) as total_clicks,
            SUM(OPEN) as total_opens,
            COUNT(DISTINCT CAMPAIGN_ID) as campaigns_count,
            COUNT(DISTINCT CLIENT_SHORTNAME) as clients_count,
            DATEDIFF('day', MAX(EVENT_DATE)::DATE, CURRENT_DATE) as days_since_last_click
        FROM email_clickers
        GROUP BY EMAIL
    ) ec_agg ON LOWER(p.email) = LOWER(ec_agg.EMAIL)
    LEFT JOIN (
        SELECT
            EMAIL,
            COUNT(*) as donation_count,
            DATEDIFF('day', MAX(DONATION_DATE)::DATE, CURRENT_DATE) as days_since_last_donation
        FROM donations
        GROUP BY EMAIL
    ) d_agg ON LOWER(p.email) = LOWER(d_agg.EMAIL)
)
SELECT
    *,
    -- Simple propensity score (can be replaced with ML model)
    ROUND(
        CASE
            WHEN past_donations > 0 THEN 0.7  -- Previous donors have high base
            ELSE 0.1
        END
        + (LEAST(total_clicks, 50) / 50.0) * 0.15  -- Click engagement boost
        + (LEAST(campaigns_count, 20) / 20.0) * 0.10  -- Campaign diversity boost
        - (LEAST(days_since_last_click, 365) / 365.0) * 0.05  -- Recency decay
    , 3) as donor_propensity_score
FROM contact_features;

Propensity Score Components: | Factor | Weight | Rationale | |--------|--------|-----------| | Previous donor | +0.70 | Historical donors are most likely to donate again | | Click engagement | +0.15 | Clickers show investment in content | | Campaign diversity | +0.10 | Engaged across multiple campaigns = committed | | Recency decay | -0.05 | Older engagement is less predictive |


4. Engagement Health Dashboard

Goal: Monitor segment health metrics over time.

-- Engagement health for a segment
WITH segment_activity AS (
    SELECT
        DATE_TRUNC('week', ec.EVENT_DATE::DATE) as week,
        COUNT(DISTINCT ec.EMAIL) as active_contacts,
        SUM(ec.SEND) as sends,
        SUM(ec.OPEN) as opens,
        SUM(ec.CLICK) as clicks,
        SUM(ec.BOUNCE) as bounces,
        SUM(ec.COMPLAINT) as complaints,
        SUM(ec.UNSUBSCRIBE) as unsubs
    FROM email_clickers ec
    WHERE ec.EMAIL IN (
        SELECT LOWER(p.email)
        FROM person_custom_field pcf
        JOIN custom_field cf ON pcf.custom_field_id = cf.id
        JOIN person p ON pcf.person_id = p.id
        WHERE cf.name = 'state_voter_id'
        AND pcf.value IN ({{svid_list}})
    )
    GROUP BY DATE_TRUNC('week', ec.EVENT_DATE::DATE)
)
SELECT
    week,
    active_contacts,
    sends,
    ROUND(100.0 * opens / NULLIF(sends, 0), 2) as open_rate,
    ROUND(100.0 * clicks / NULLIF(opens, 0), 2) as ctr,
    ROUND(100.0 * bounces / NULLIF(sends, 0), 2) as bounce_rate,
    ROUND(100.0 * unsubs / NULLIF(sends, 0), 4) as unsub_rate,
    ROUND(100.0 * complaints / NULLIF(sends, 0), 4) as complaint_rate
FROM segment_activity
ORDER BY week DESC;

Health Indicators: | Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| | Open Rate | >20% | 10-20% | <10% | | Click-Through Rate | >3% | 1-3% | <1% | | Bounce Rate | <2% | 2-5% | >5% | | Unsubscribe Rate | <0.1% | 0.1-0.5% | >0.5% | | Complaint Rate | <0.01% | 0.01-0.05% | >0.05% |


5. Lookalike Expansion

Goal: Find contacts similar to a seed segment based on behavior.

-- Find lookalikes based on client affinity pattern
WITH seed_pattern AS (
    -- Get client engagement pattern for seed segment
    SELECT
        ec.CLIENT_SHORTNAME,
        COUNT(*) as engagement_count,
        COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () as engagement_share
    FROM email_clickers ec
    WHERE ec.EMAIL IN (
        SELECT LOWER(p.email) FROM person p
        JOIN person_custom_field pcf ON pcf.person_id = p.id
        JOIN custom_field cf ON pcf.custom_field_id = cf.id
        WHERE cf.name = 'state_voter_id'
        AND pcf.value IN ({{seed_svid_list}})
    )
    GROUP BY ec.CLIENT_SHORTNAME
    HAVING COUNT(*) > 10  -- Minimum engagement threshold
),
candidate_patterns AS (
    -- Get patterns for all other contacts
    SELECT
        ec.EMAIL,
        ec.CLIENT_SHORTNAME,
        COUNT(*) as engagement_count
    FROM email_clickers ec
    WHERE ec.EMAIL NOT IN (
        SELECT LOWER(p.email) FROM person p
        JOIN person_custom_field pcf ON pcf.person_id = p.id
        WHERE pcf.value IN ({{seed_svid_list}})
    )
    GROUP BY ec.EMAIL, ec.CLIENT_SHORTNAME
),
similarity_scores AS (
    SELECT
        cp.EMAIL,
        SUM(
            LEAST(cp.engagement_count, sp.engagement_count * 2) * sp.engagement_share
        ) as similarity_score
    FROM candidate_patterns cp
    JOIN seed_pattern sp ON cp.CLIENT_SHORTNAME = sp.CLIENT_SHORTNAME
    GROUP BY cp.EMAIL
)
SELECT
    ss.EMAIL,
    ss.similarity_score,
    p.id as person_id,
    pcf.value as svid
FROM similarity_scores ss
JOIN person p ON LOWER(p.email) = LOWER(ss.EMAIL)
JOIN person_custom_field pcf ON pcf.person_id = p.id
JOIN custom_field cf ON pcf.custom_field_id = cf.id AND cf.name = 'state_voter_id'
ORDER BY ss.similarity_score DESC
LIMIT 1000;

6. Campaign Attribution

Goal: Track which campaigns drive donations.

-- Campaign to donation attribution
WITH campaign_contacts AS (
    SELECT
        ec.CAMPAIGN_ID,
        c.CAMPAIGN_NAME,
        c.CLIENT_SHORTNAME,
        c.START_DATE,
        ec.EMAIL,
        ec.EVENT_DATE,
        ec.CLICK
    FROM email_clickers ec
    JOIN campaigns c ON ec.CAMPAIGN_ID = c.CAMPAIGN_ID
    WHERE ec.CLICK = 1
),
attributed_donations AS (
    SELECT
        cc.CAMPAIGN_ID,
        cc.CAMPAIGN_NAME,
        cc.CLIENT_SHORTNAME,
        d.EMAIL,
        d.DONATION_DATE,
        cc.EVENT_DATE as click_date,
        DATEDIFF('day', cc.EVENT_DATE::DATE, d.DONATION_DATE::DATE) as days_to_donation
    FROM campaign_contacts cc
    JOIN donations d ON LOWER(cc.EMAIL) = LOWER(d.EMAIL)
    WHERE d.DONATION_DATE >= cc.EVENT_DATE  -- Donation after click
    AND DATEDIFF('day', cc.EVENT_DATE::DATE, d.DONATION_DATE::DATE) <= 30  -- 30-day window
)
SELECT
    CLIENT_SHORTNAME,
    CAMPAIGN_NAME,
    COUNT(DISTINCT EMAIL) as donors_attributed,
    AVG(days_to_donation) as avg_days_to_donate,
    MIN(days_to_donation) as min_days,
    MAX(days_to_donation) as max_days
FROM attributed_donations
GROUP BY CLIENT_SHORTNAME, CAMPAIGN_NAME
ORDER BY donors_attributed DESC
LIMIT 50;

API Endpoints (Proposed)

Segment Analysis

POST /api/v1/segment/analyze
{
    "svids": ["2712345", "2712463", ...],
    "include": {
        "demographics": true,
        "engagement": true,
        "donations": true,
        "client_affinity": true
    },
    "deviation_threshold": 0.1
}

Donor Propensity

POST /api/v1/segment/propensity
{
    "svids": ["2712345", "2712463", ...],
    "model": "donor",
    "threshold": 0.5
}

Lookalike Expansion

POST /api/v1/segment/lookalike
{
    "seed_svids": ["2712345", "2712463", ...],
    "expansion_size": 1000,
    "min_similarity": 0.3
}

Client Affinity

POST /api/v1/segment/affinity
{
    "svids": ["2712345", "2712463", ...],
    "top_n": 10
}

Implementation Priorities

Phase 1: Core Enrichment (MVP)

  1. Load behavioral data into DuckDB
  2. Create cc_email_clickers, cc_donations, cc_campaigns tables
  3. Build indexes on EMAIL, CAMPAIGN_ID

  4. Extend segment analysis endpoint

  5. Add is_donor, is_clicker flags
  6. Add total_clicks, total_opens, campaigns_engaged
  7. Add last_engagement_date, last_donation_date

  8. Add client affinity endpoint

  9. Return top-N clients by engagement for segment
  10. Include open_rate, ctr, affinity_score per client

Phase 2: Propensity & Attribution

  1. Donor propensity scoring
  2. Feature engineering from behavioral data
  3. Simple rule-based scoring (upgradeable to ML)

  4. Campaign attribution

  5. Click-to-donation attribution
  6. Attribution window configuration

Phase 3: Advanced Analytics

  1. Lookalike modeling
  2. Client affinity pattern matching
  3. Behavioral similarity scoring

  4. Engagement health monitoring

  5. Time-series metrics
  6. Alerting on degradation

Additional Data Opportunities

Available for Future Extraction

Table Description Potential Use
EMAIL_OPENERS All email opens (not just clickers) More complete engagement picture
SMS_RESPONSES P2P text responses Multi-channel engagement
PHONE_CONTACTS Call center interactions Voice channel data
SURVEY_RESPONSES Poll/survey data Sentiment and opinion signals
EVENT_ATTENDANCE Rally/event participation Offline engagement
PETITION_SIGNERS Petition activity Issue-level interest
WEB_VISITS Website behavior Digital engagement depth

Cross-Channel Insights

EMAIL engagement + SMS response + Donation =
    → Multi-channel engagement score
    → Channel preference indicator
    → Best time/channel to reach

Temporal Patterns

Historical engagement patterns →
    → Seasonal responsiveness (election cycles)
    → Day-of-week preferences
    → Time-of-day optimization

Success Metrics

Metric Target Measurement
Enrichment coverage >80% % of SVIDs with behavioral data
Donor prediction accuracy >70% Precision@K for donation targeting
Lookalike quality >2x Engagement lift vs random selection
Query latency <2s P95 response time for segment analysis

Next Steps

  1. Load extracted parquet files into pocket.db (or separate analysis.db)
  2. Create linking views between person table and behavioral data
  3. Extend existing segment analysis API with behavioral enrichment
  4. Build propensity scoring function as SQL or Python UDF
  5. Test with known segments to validate linkage rates and insights