Map Data Ingestion Guide¶
This document covers the process of ingesting polygon boundary data into the cbdistricts layer system.
Overview¶
The cbdistricts platform supports multiple geographic boundary layers stored as individual DuckDB databases with spatial indexing. Each layer is registered in data/layers/registry.json and served via the /api/v1/layers endpoint.
Data Sources¶
Primary: US Census Bureau¶
The Census Bureau provides authoritative boundary files for US geography.
TIGER/Line Shapefiles¶
- URL: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
- Direct download: https://www2.census.gov/geo/tiger/
- Characteristics: Full resolution, includes all attributes, larger file sizes
- Best for: Detailed analysis, when precision matters
Cartographic Boundary Files (Recommended)¶
- URL: https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html
- Direct download: https://www2.census.gov/geo/tiger/GENZ2023/shp/
- Characteristics: Simplified for display, smaller files, 1:500k or 1:5m scale
- Best for: Web visualization (what we use)
Available Geographic Layers¶
| Layer | File Pattern | Features | Notes |
|---|---|---|---|
| Congressional Districts | cb_YYYY_us_cd118_500k.zip |
~441 | Updates each Congress |
| Counties | cb_YYYY_us_county_500k.zip |
~3,235 | All US counties |
| States | cb_YYYY_us_state_500k.zip |
56 | States + territories |
| State Legislative (Upper) | cb_YYYY_SS_sldu_500k.zip |
Varies | State Senate districts |
| State Legislative (Lower) | cb_YYYY_SS_sldl_500k.zip |
Varies | State House districts |
| School Districts | cb_YYYY_us_unsd_500k.zip |
~13,000 | Unified school districts |
| County Subdivisions | cb_YYYY_SS_cousub_500k.zip |
Varies | Townships, etc. |
| Places | cb_YYYY_SS_place_500k.zip |
Varies | Cities, towns, CDPs |
| ZCTAs | cb_YYYY_us_zcta520_500k.zip |
~33,000 | ZIP Code areas |
| Urban Areas | cb_YYYY_us_ua10_500k.zip |
~3,600 | Urban/rural boundaries |
| Native Areas | cb_YYYY_us_aiannh_500k.zip |
~700 | Tribal areas |
| Metropolitan Areas | cb_YYYY_us_cbsa_500k.zip |
~930 | MSAs and CSAs |
Note: YYYY = year (use 2023 or latest), SS = state FIPS code (e.g., 26 for Michigan)
Secondary: State GIS Portals¶
Many states maintain their own GIS data portals with additional layers:
| State | Portal | Notable Data |
|---|---|---|
| Michigan | https://gis-michigan.opendata.arcgis.com | Redistricting, townships |
| California | https://gis.data.ca.gov | Fire districts, water |
| Texas | https://tnris.org | Comprehensive GIS |
| New York | https://gis.ny.gov | Many layers |
Tertiary: ArcGIS Open Data¶
- URL: https://hub.arcgis.com/search
- Search for specific boundary types
- Often includes state-specific redistricting data
- Can export as GeoJSON directly
Ingestion Process¶
Step 1: Download Source Data¶
# Create source directory
mkdir -p data/source/counties
# Download Census cartographic boundary file
cd data/source/counties
curl -sLO "https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_us_county_500k.zip"
unzip cb_2023_us_county_500k.zip
Step 2: Convert to GeoJSON¶
The import script accepts GeoJSON. Convert shapefiles using Python:
import shapefile
import json
# Read shapefile
sf = shapefile.Reader('cb_2023_us_county_500k')
fields = [f[0] for f in sf.fields[1:]]
# Optional: Filter by state (e.g., Michigan = '26')
statefp_idx = fields.index('STATEFP')
features = []
for sr in sf.shapeRecords():
rec = sr.record
# Uncomment to filter: if rec[statefp_idx] != '26': continue
props = {fields[i]: rec[i] for i in range(len(fields))}
features.append({
'type': 'Feature',
'properties': props,
'geometry': sr.shape.__geo_interface__
})
# Save GeoJSON
with open('output.geojson', 'w') as f:
json.dump({'type': 'FeatureCollection', 'features': features}, f)
Step 3: Import as Layer¶
Use the import_layer.py script:
python scripts/import_layer.py \
--source data/source/counties/michigan_counties.geojson \
--layer-id michigan-counties \
--name "Michigan Counties" \
--description "83 Michigan county boundaries (Census Bureau 2023)" \
--scope state \
--state MI \
--id-field GEOID \
--name-field NAME
Import Script Options¶
| Option | Required | Description |
|---|---|---|
--source |
Yes | Path to GeoJSON file |
--layer-id |
Yes | Unique identifier (kebab-case) |
--name |
Yes | Display name |
--description |
No | Layer description |
--scope |
No | federal, state, or national |
--state |
No | Two-letter state code (for state scope) |
--id-field |
Yes | Property to use as feature ID |
--name-field |
Yes | Property to use as feature name |
Step 4: Refresh Services¶
# Clear Redis cache
redis-cli -n 1 DEL "cbdistricts:layers:list_layers"
# Restart services
sudo systemctl restart cbdistricts-api cbdistricts-web
Layer Naming Conventions¶
{state}-{type} # State-specific layers
us-{type} # National layers
{state}-{type}-{year} # Versioned layers
Examples:
- michigan-counties
- michigan-state-house
- us-counties
- us-congressional-118
Database Schema¶
Each layer creates a DuckDB database at data/layers/{state|us}/{type}.duckdb:
CREATE TABLE features (
id VARCHAR PRIMARY KEY,
name VARCHAR,
properties JSON,
geometry GEOMETRY
);
CREATE INDEX idx_features_geometry ON features USING RTREE (geometry);
Automation Ideas¶
1. Bulk Download Script¶
Create scripts/download_census_layers.py:
#!/usr/bin/env python3
"""Download all Census cartographic boundary files."""
CENSUS_BASE = "https://www2.census.gov/geo/tiger/GENZ2023/shp"
NATIONAL_LAYERS = [
("cb_2023_us_county_500k.zip", "counties"),
("cb_2023_us_state_500k.zip", "states"),
("cb_2023_us_cd118_500k.zip", "congressional"),
("cb_2023_us_cbsa_500k.zip", "metro-areas"),
("cb_2023_us_zcta520_500k.zip", "zctas"),
("cb_2023_us_ua10_500k.zip", "urban-areas"),
]
STATE_LAYERS = [
("cb_2023_{fips}_sldu_500k.zip", "state-senate"),
("cb_2023_{fips}_sldl_500k.zip", "state-house"),
("cb_2023_{fips}_cousub_500k.zip", "county-subdivisions"),
("cb_2023_{fips}_place_500k.zip", "places"),
]
# Download logic here...
2. Auto-Import Pipeline¶
Create scripts/import_all_layers.py:
#!/usr/bin/env python3
"""Import all downloaded shapefiles as layers."""
LAYER_CONFIGS = [
{
"source_pattern": "cb_*_us_county_500k",
"layer_id": "us-counties",
"name": "US Counties",
"id_field": "GEOID",
"name_field": "NAME",
"scope": "national",
},
{
"source_pattern": "cb_*_us_state_500k",
"layer_id": "us-states",
"name": "US States",
"id_field": "GEOID",
"name_field": "NAME",
"scope": "national",
},
# ... more configs
]
3. State-by-State Generator¶
Generate state-specific layers automatically:
STATES = {
"01": "AL", "02": "AK", "04": "AZ", "05": "AR", "06": "CA",
"08": "CO", "09": "CT", "10": "DE", "11": "DC", "12": "FL",
# ... all 50 states + DC + territories
}
for fips, abbrev in STATES.items():
# Download state legislative districts
# Filter county file for this state
# Import as {abbrev.lower()}-counties, {abbrev.lower()}-state-house, etc.
4. Recommended Pre-Population Layers¶
Priority order for initial population:
Tier 1: High Value (do first)¶
-
us-congressional- Federal congressional districts -
us-counties- All 3,235 US counties -
us-states- State boundaries (for reference)
Tier 2: State Legislative (key states)¶
-
michigan-state-house- Already imported -
michigan-state-senate- Already imported - Swing states: PA, WI, AZ, GA, NC, NV
Tier 3: Administrative¶
-
us-metro-areas- MSAs for urban analysis -
us-zctas- ZIP code approximations - State counties for key states
Tier 4: Specialized¶
- School districts (for education campaigns)
- Native American areas
- Urban/rural boundaries
5. Layer Update Automation¶
Create a cron job or scheduled task:
# /etc/cron.monthly/update-census-layers
#!/bin/bash
cd /home/bisenbek/projects/nominate/cbdistricts
python scripts/download_census_layers.py --check-updates
python scripts/import_all_layers.py --update-only
redis-cli -n 1 FLUSHDB
sudo systemctl restart cbdistricts-api cbdistricts-web
Virtual Layers¶
Some layers don't come from shapefiles but are generated from other data:
- Radio Coverage (
radio-coverage): Generated from FCC contour data in main database - Future: Election results overlays, demographic choropleth layers
These are handled specially in api/services/layer_service.py.
Troubleshooting¶
Common Issues¶
- Layer not appearing in dropdown
- Check
scopematches JS filter (federal,state, ornational) - Clear Redis cache:
redis-cli -n 1 DEL "cbdistricts:layers:list_layers" -
Restart services
-
Import fails with geometry error
- Ensure GeoJSON is valid:
python -m json.tool file.geojson > /dev/null -
Check coordinate system is WGS84 (EPSG:4326)
-
Large file performance
- Use cartographic (500k) not TIGER/Line for web display
- Consider simplifying with
mapshaperfor very large layers
Useful Commands¶
# Check registry contents
cat data/layers/registry.json | python -m json.tool
# List layer databases
find data/layers -name "*.duckdb" -ls
# Query layer features
python -c "
import duckdb
conn = duckdb.connect('data/layers/mi/counties.duckdb', read_only=True)
print(conn.execute('SELECT COUNT(*) FROM features').fetchone())
"
# Test API endpoint
curl -s localhost:32406/api/v1/layers | python -m json.tool
References¶
- Census Bureau Geography: https://www.census.gov/programs-surveys/geography.html
- TIGER/Line Documentation: https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/tiger-geo-line.html
- FIPS State Codes: https://www.census.gov/library/reference/code-lists/ansi.html
- GeoJSON Specification: https://datatracker.ietf.org/doc/html/rfc7946