Skip to content

Map Data Ingestion Guide

This document covers the process of ingesting polygon boundary data into the cbdistricts layer system.

Overview

The cbdistricts platform supports multiple geographic boundary layers stored as individual DuckDB databases with spatial indexing. Each layer is registered in data/layers/registry.json and served via the /api/v1/layers endpoint.

Data Sources

Primary: US Census Bureau

The Census Bureau provides authoritative boundary files for US geography.

TIGER/Line Shapefiles

  • URL: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
  • Direct download: https://www2.census.gov/geo/tiger/
  • Characteristics: Full resolution, includes all attributes, larger file sizes
  • Best for: Detailed analysis, when precision matters
  • URL: https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html
  • Direct download: https://www2.census.gov/geo/tiger/GENZ2023/shp/
  • Characteristics: Simplified for display, smaller files, 1:500k or 1:5m scale
  • Best for: Web visualization (what we use)

Available Geographic Layers

Layer File Pattern Features Notes
Congressional Districts cb_YYYY_us_cd118_500k.zip ~441 Updates each Congress
Counties cb_YYYY_us_county_500k.zip ~3,235 All US counties
States cb_YYYY_us_state_500k.zip 56 States + territories
State Legislative (Upper) cb_YYYY_SS_sldu_500k.zip Varies State Senate districts
State Legislative (Lower) cb_YYYY_SS_sldl_500k.zip Varies State House districts
School Districts cb_YYYY_us_unsd_500k.zip ~13,000 Unified school districts
County Subdivisions cb_YYYY_SS_cousub_500k.zip Varies Townships, etc.
Places cb_YYYY_SS_place_500k.zip Varies Cities, towns, CDPs
ZCTAs cb_YYYY_us_zcta520_500k.zip ~33,000 ZIP Code areas
Urban Areas cb_YYYY_us_ua10_500k.zip ~3,600 Urban/rural boundaries
Native Areas cb_YYYY_us_aiannh_500k.zip ~700 Tribal areas
Metropolitan Areas cb_YYYY_us_cbsa_500k.zip ~930 MSAs and CSAs

Note: YYYY = year (use 2023 or latest), SS = state FIPS code (e.g., 26 for Michigan)

Secondary: State GIS Portals

Many states maintain their own GIS data portals with additional layers:

State Portal Notable Data
Michigan https://gis-michigan.opendata.arcgis.com Redistricting, townships
California https://gis.data.ca.gov Fire districts, water
Texas https://tnris.org Comprehensive GIS
New York https://gis.ny.gov Many layers

Tertiary: ArcGIS Open Data

  • URL: https://hub.arcgis.com/search
  • Search for specific boundary types
  • Often includes state-specific redistricting data
  • Can export as GeoJSON directly

Ingestion Process

Step 1: Download Source Data

# Create source directory
mkdir -p data/source/counties

# Download Census cartographic boundary file
cd data/source/counties
curl -sLO "https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_us_county_500k.zip"
unzip cb_2023_us_county_500k.zip

Step 2: Convert to GeoJSON

The import script accepts GeoJSON. Convert shapefiles using Python:

import shapefile
import json

# Read shapefile
sf = shapefile.Reader('cb_2023_us_county_500k')
fields = [f[0] for f in sf.fields[1:]]

# Optional: Filter by state (e.g., Michigan = '26')
statefp_idx = fields.index('STATEFP')

features = []
for sr in sf.shapeRecords():
    rec = sr.record
    # Uncomment to filter: if rec[statefp_idx] != '26': continue
    props = {fields[i]: rec[i] for i in range(len(fields))}
    features.append({
        'type': 'Feature',
        'properties': props,
        'geometry': sr.shape.__geo_interface__
    })

# Save GeoJSON
with open('output.geojson', 'w') as f:
    json.dump({'type': 'FeatureCollection', 'features': features}, f)

Step 3: Import as Layer

Use the import_layer.py script:

python scripts/import_layer.py \
    --source data/source/counties/michigan_counties.geojson \
    --layer-id michigan-counties \
    --name "Michigan Counties" \
    --description "83 Michigan county boundaries (Census Bureau 2023)" \
    --scope state \
    --state MI \
    --id-field GEOID \
    --name-field NAME

Import Script Options

Option Required Description
--source Yes Path to GeoJSON file
--layer-id Yes Unique identifier (kebab-case)
--name Yes Display name
--description No Layer description
--scope No federal, state, or national
--state No Two-letter state code (for state scope)
--id-field Yes Property to use as feature ID
--name-field Yes Property to use as feature name

Step 4: Refresh Services

# Clear Redis cache
redis-cli -n 1 DEL "cbdistricts:layers:list_layers"

# Restart services
sudo systemctl restart cbdistricts-api cbdistricts-web

Layer Naming Conventions

{state}-{type}           # State-specific layers
us-{type}                # National layers
{state}-{type}-{year}    # Versioned layers

Examples: - michigan-counties - michigan-state-house - us-counties - us-congressional-118

Database Schema

Each layer creates a DuckDB database at data/layers/{state|us}/{type}.duckdb:

CREATE TABLE features (
    id VARCHAR PRIMARY KEY,
    name VARCHAR,
    properties JSON,
    geometry GEOMETRY
);
CREATE INDEX idx_features_geometry ON features USING RTREE (geometry);

Automation Ideas

1. Bulk Download Script

Create scripts/download_census_layers.py:

#!/usr/bin/env python3
"""Download all Census cartographic boundary files."""

CENSUS_BASE = "https://www2.census.gov/geo/tiger/GENZ2023/shp"

NATIONAL_LAYERS = [
    ("cb_2023_us_county_500k.zip", "counties"),
    ("cb_2023_us_state_500k.zip", "states"),
    ("cb_2023_us_cd118_500k.zip", "congressional"),
    ("cb_2023_us_cbsa_500k.zip", "metro-areas"),
    ("cb_2023_us_zcta520_500k.zip", "zctas"),
    ("cb_2023_us_ua10_500k.zip", "urban-areas"),
]

STATE_LAYERS = [
    ("cb_2023_{fips}_sldu_500k.zip", "state-senate"),
    ("cb_2023_{fips}_sldl_500k.zip", "state-house"),
    ("cb_2023_{fips}_cousub_500k.zip", "county-subdivisions"),
    ("cb_2023_{fips}_place_500k.zip", "places"),
]

# Download logic here...

2. Auto-Import Pipeline

Create scripts/import_all_layers.py:

#!/usr/bin/env python3
"""Import all downloaded shapefiles as layers."""

LAYER_CONFIGS = [
    {
        "source_pattern": "cb_*_us_county_500k",
        "layer_id": "us-counties",
        "name": "US Counties",
        "id_field": "GEOID",
        "name_field": "NAME",
        "scope": "national",
    },
    {
        "source_pattern": "cb_*_us_state_500k",
        "layer_id": "us-states",
        "name": "US States",
        "id_field": "GEOID",
        "name_field": "NAME",
        "scope": "national",
    },
    # ... more configs
]

3. State-by-State Generator

Generate state-specific layers automatically:

STATES = {
    "01": "AL", "02": "AK", "04": "AZ", "05": "AR", "06": "CA",
    "08": "CO", "09": "CT", "10": "DE", "11": "DC", "12": "FL",
    # ... all 50 states + DC + territories
}

for fips, abbrev in STATES.items():
    # Download state legislative districts
    # Filter county file for this state
    # Import as {abbrev.lower()}-counties, {abbrev.lower()}-state-house, etc.

Priority order for initial population:

Tier 1: High Value (do first)

  • us-congressional - Federal congressional districts
  • us-counties - All 3,235 US counties
  • us-states - State boundaries (for reference)

Tier 2: State Legislative (key states)

  • michigan-state-house - Already imported
  • michigan-state-senate - Already imported
  • Swing states: PA, WI, AZ, GA, NC, NV

Tier 3: Administrative

  • us-metro-areas - MSAs for urban analysis
  • us-zctas - ZIP code approximations
  • State counties for key states

Tier 4: Specialized

  • School districts (for education campaigns)
  • Native American areas
  • Urban/rural boundaries

5. Layer Update Automation

Create a cron job or scheduled task:

# /etc/cron.monthly/update-census-layers
#!/bin/bash
cd /home/bisenbek/projects/nominate/cbdistricts
python scripts/download_census_layers.py --check-updates
python scripts/import_all_layers.py --update-only
redis-cli -n 1 FLUSHDB
sudo systemctl restart cbdistricts-api cbdistricts-web

Virtual Layers

Some layers don't come from shapefiles but are generated from other data:

  • Radio Coverage (radio-coverage): Generated from FCC contour data in main database
  • Future: Election results overlays, demographic choropleth layers

These are handled specially in api/services/layer_service.py.

Troubleshooting

Common Issues

  1. Layer not appearing in dropdown
  2. Check scope matches JS filter (federal, state, or national)
  3. Clear Redis cache: redis-cli -n 1 DEL "cbdistricts:layers:list_layers"
  4. Restart services

  5. Import fails with geometry error

  6. Ensure GeoJSON is valid: python -m json.tool file.geojson > /dev/null
  7. Check coordinate system is WGS84 (EPSG:4326)

  8. Large file performance

  9. Use cartographic (500k) not TIGER/Line for web display
  10. Consider simplifying with mapshaper for very large layers

Useful Commands

# Check registry contents
cat data/layers/registry.json | python -m json.tool

# List layer databases
find data/layers -name "*.duckdb" -ls

# Query layer features
python -c "
import duckdb
conn = duckdb.connect('data/layers/mi/counties.duckdb', read_only=True)
print(conn.execute('SELECT COUNT(*) FROM features').fetchone())
"

# Test API endpoint
curl -s localhost:32406/api/v1/layers | python -m json.tool

References

  • Census Bureau Geography: https://www.census.gov/programs-surveys/geography.html
  • TIGER/Line Documentation: https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/tiger-geo-line.html
  • FIPS State Codes: https://www.census.gov/library/reference/code-lists/ansi.html
  • GeoJSON Specification: https://datatracker.ietf.org/doc/html/rfc7946