Congressional District Data Platform¶

Data Architecture Analysis & Platform Proposal

December 2025

Executive Summary¶

This analysis explores the feasibility of building an interactive congressional district data mining platform for Campaign Brain. After examining the U.S. Census Bureau's data offerings for the 119^th Congress, I've identified a manageable, well-structured dataset that would work excellently with DuckDB's spatial extension and GeoParquet format.

Key Finding: The entire dataset for all 441 congressional districts—including geometries and comprehensive demographic data—would total approximately 200-250MB, making a single-database approach viable without needing per-district partitioning.

Data Coverage & Sources¶

Geographic Boundaries¶

The Census Bureau provides two types of boundary files for the 119^th Congress (January 2025 – January 2027):

File Type	Size (Compressed)	Coverage	Use Case
TIGER/Line Shapefiles	~150 MB total	Per-state files	Detailed boundaries
Cartographic Boundary	7 MB (national)	Single national file	Visualization (1:500k)

Source URLs: - TIGER/Line: https://www2.census.gov/geo/tiger/TIGER2024/CD/ - Cartographic: https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_us_cd119_500k.zip

Demographic & Economic Data¶

The Census Bureau's American Community Survey (ACS) provides comprehensive data via API:

Data Profile	Variables	Coverage
DP02 – Social	616	Education, marital status, language, ancestry, veterans
DP03 – Economic	274	Employment, commuting, income, poverty, health insurance
DP04 – Housing	286	Occupancy, structure, value, rent, utilities
DP05 – Demographics	188	Age, sex, race, Hispanic origin, voting-age population
Detailed Tables	36,722	Granular breakdowns of all topics
Subject Tables	18,645	Topic-specific summaries with percentages

Total: ~56,731 variables available per congressional district

Data Size Estimation¶

Component	Estimated Size
Geometry (GeoParquet, 441 districts)	~10 MB
Data Profiles (1,364 vars × 441 districts)	~5 MB
Full ACS Tables (56,731 vars × 441 districts)	~190 MB
5 Years Historical Data	~950 MB
Total (Core + 5 Years)	~1.2 GB (manageable)

Architecture Recommendation¶

Single DuckDB Database (Recommended)¶

Given the manageable data size, I recommend a single DuckDB database rather than per-district partitioning:

Simplicity: One database file, easy to deploy and backup
Performance: DuckDB handles ~1GB datasets easily; spatial queries are fast with bbox filtering
Joins: Cross-district analysis becomes trivial (e.g., "show all districts where median income > $75k")
DuckDB Spatial: Reads Shapefiles and GeoPackages directly, exports GeoParquet with bbox for fast filtering

Proposed Schema¶

-- Core tables
districts        (geoid, geometry, state_fips, district_num, namelsad, aland, awater)
demographics     (geoid, year, dp02_*, dp03_*, dp04_*, dp05_* columns)
detailed_tables  (geoid, year, table_id, variable_id, estimate, moe)

-- Enrichment tables (for Campaign Brain)
district_news    (geoid, date, headline, source_url, sentiment)
rep_info         (geoid, congress_num, rep_name, party, committees)
campaign_events  (geoid, event_date, event_type, description)

Your Tile Server Idea¶

The tile server would be valuable for interactive visualization! Here's how it fits:

Vector Tiles: Generate MVT tiles from the cartographic boundaries (7MB source → fast tiles)
Dynamic Styling: Color districts by any metric (turnout, income, party lean)
Tippecanoe: Convert GeoJSON/FlatGeobuf → PMTiles for serverless hosting or MBTiles for tile server
DuckDB + Tiles: Query DuckDB for data, tile server for rendering—clean separation of concerns

Implementation Approach¶

Phase 1: Data Pipeline¶

Download: Fetch national cartographic boundary file (7MB) + TIGER/Line if detailed boundaries needed
Census API: Batch-fetch ACS Data Profiles for all 441 districts (API key required, free)
Transform: Convert to GeoParquet + normalize demographic tables
Load: Insert into DuckDB with spatial extension

Phase 2: Visualization¶

Generate vector tiles with Tippecanoe or similar
Deploy tile server (or use PMTiles for serverless)
Build FastHTML frontend with MapLibre GL JS
Connect click events to DuckDB queries for district details

Phase 3: Enrichment (Campaign Brain Value-Add)¶

News Aggregation: RSS feeds + Claude summarization for district-level political news
Representative Data: Congress.gov API for current rep info, voting records, committees
Election History: MIT Election Lab data for historical results
Custom Metrics: Combine Census data into campaign-relevant scores (persuadability, turnout potential)

Technical Details from Analysis¶

Boundary File Structure¶

The national cartographic boundary file contains:

441 districts (435 voting + 6 non-voting delegates)
56 state/territory FIPS codes represented
633,459 total vertices across all geometries
Average 1,436 vertices per district

Top States by District Count¶

State	Districts
California	52
Texas	38
Florida	28
New York	26
Illinois	17
Pennsylvania	17
Ohio	15
Georgia	14
North Carolina	14
Michigan	13

Census API Sample¶

import requests

# Get demographic data for all California congressional districts
params = {
    'get': 'NAME,DP05_0001E,DP05_0018E',  # Name, Total Pop, Median Age
    'for': 'congressional district:*',
    'in': 'state:06'
}
resp = requests.get(
    'https://api.census.gov/data/2023/acs/acs1/profile',
    params=params
)
# Returns data for all 52 CA districts

Summary¶

The congressional district data platform is highly feasible:

✅ Data is available: Census provides comprehensive, well-documented data via free API
✅ Size is manageable: ~1-2GB total, easily handled by DuckDB
✅ No per-district DBs needed: Single database with spatial indexing is more practical
✅ Tile server adds value: For interactive maps, your tile server idea is the right approach
✅ Growth path: Start with Census data, add news/election/rep data incrementally

Next Steps: I can help build out any of these components—the data pipeline, DuckDB schema, tile generation workflow, or FastHTML visualization. Let me know which piece you'd like to tackle first!