Skip to content

Layer Onboarding Guide

Step-by-step instructions for adding new polygon layers to the cbdistricts platform.

Quick Reference

# Standard import command
python scripts/import_layer.py \
  --source <path-to-geojson> \
  --layer-id <unique-layer-id> \
  --name "<Display Name>" \
  --id-field <ID_COLUMN> \
  --name-field <NAME_COLUMN> \
  --scope state \
  --state <STATE_CODE> \
  --source-url "<data-source-url>"

Data Sources

The Census Bureau provides authoritative boundary files for all U.S. political districts.

Base URL: https://www2.census.gov/geo/tiger/

Layer Type Path Pattern Example
Congressional Districts GENZ{YEAR}/shp/cb_{YEAR}_us_cd{CONGRESS}_500k.zip cb_2024_us_cd119_500k.zip
State Senate (SLDU) GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldu_500k.zip cb_2024_26_sldu_500k.zip
State House (SLDL) GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldl_500k.zip cb_2024_26_sldl_500k.zip
Counties GENZ{YEAR}/shp/cb_{YEAR}_us_county_500k.zip cb_2024_us_county_500k.zip
School Districts GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_unsd_500k.zip cb_2024_26_unsd_500k.zip

State FIPS Codes: Full list - Michigan: 26 - Ohio: 39 - Pennsylvania: 42 - Iowa: 19 - Kentucky: 21

State GIS Portals

Many states maintain their own GIS data with richer attributes (legislator names, party, etc.).

State Portal Notes
Michigan https://gis-michigan.opendata.arcgis.com Has legislator/party data
Ohio https://geohio.ohio.gov
Pennsylvania https://www.pasda.psu.edu

Other Sources

  • Redistricting Data Hub: https://redistrictingdatahub.org (requires free account)
  • VEST/Harvard Dataverse: Election + precinct data with shapefiles

Supported Data Formats

Format Extension Notes
GeoJSON .geojson Direct import, preferred
Shapefile .shp Requires conversion to GeoJSON
GeoPackage .gpkg Requires conversion
KML .kml Requires conversion

Step-by-Step Import Process

Step 1: Identify the Data Source

Determine what layer you need and find the authoritative source.

Example: Michigan State Senate - Layer type: State Legislative Districts Upper (SLDU) - State FIPS: 26 - Census URL: https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip

Step 2: Download the Data

# Create state data directory
mkdir -p state-data/{state-code}
cd state-data/{state-code}

# Download from Census
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip
unzip cb_2024_26_sldu_500k.zip

Step 3: Inspect the Data

Use DuckDB to examine the shapefile fields:

source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_26_sldu_500k.shp')")

# Show schema
print("SCHEMA:")
print(conn.execute('DESCRIBE layer').df())

# Show sample data
print("\nSAMPLE DATA:")
print(conn.execute('SELECT * FROM layer LIMIT 5').df())
EOF

Common Census TIGER/Line Fields:

Field Description Use As
GEOID Geographic identifier id_field
NAME Short name (e.g., "1")
NAMELSAD Full name (e.g., "State Senate District 1") name_field
SLDUST Senate district number id_field alternative
SLDLST House district number id_field alternative
STATEFP State FIPS code Property
ALAND Land area (sq meters) Property
AWATER Water area (sq meters) Property

Step 4: Convert to GeoJSON

If your source is a shapefile, convert it to GeoJSON:

import duckdb
import json

conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('your_file.shp')")

# Build GeoJSON
features = []
rows = conn.execute("""
    SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
           ST_AsGeoJSON(geom) as geom_json
    FROM layer
""").fetchall()

for row in rows:
    feature = {
        "type": "Feature",
        "properties": {
            "SLDUST": row[0],
            "NAMELSAD": row[1],
            "NAME": row[2],
            "STATEFP": row[3],
            "GEOID": row[4],
            "ALAND": row[5],
            "AWATER": row[6]
        },
        "geometry": json.loads(row[7])
    }
    features.append(feature)

geojson = {"type": "FeatureCollection", "features": features}
with open('output.geojson', 'w') as f:
    json.dump(geojson, f)

Step 5: Import the Layer

Run the import script from the project root:

cd /path/to/cbdistricts
source ~/.pyenv/versions/nominates/bin/activate

python scripts/import_layer.py \
  --source state-data/michigan/michigan_state_senate.geojson \
  --layer-id michigan-state-senate \
  --name "Michigan State Senate Districts" \
  --description "Michigan State Senate districts (38 total) from 2024 Census TIGER/Line SLDU boundaries" \
  --id-field SLDUST \
  --name-field NAMELSAD \
  --scope state \
  --state MI \
  --source-name "U.S. Census Bureau TIGER/Line" \
  --source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip"

Import Script Parameters:

Parameter Required Description
--source Yes Path to GeoJSON file
--layer-id Yes Unique identifier (e.g., michigan-state-senate)
--name Yes Human-readable name
--description No Layer description
--id-field Yes Property to use as feature ID
--name-field No Property to use as display name (defaults to id-field)
--scope No federal, state, or local (default: state)
--state No State code for state/local layers (e.g., MI)
--source-name No Data source name
--source-url No Data source URL

Step 6: Verify the Import

import duckdb
import json

# Check the database
conn = duckdb.connect('data/layers/mi/senate.duckdb', read_only=True)
conn.execute('LOAD spatial')

# Count features
count = conn.execute('SELECT COUNT(*) FROM features').fetchone()[0]
print(f'Total features: {count}')

# Sample features
print('\nSample features:')
rows = conn.execute('SELECT id, name FROM features ORDER BY id LIMIT 10').fetchall()
for row in rows:
    print(f'  {row[0]}: {row[1]}')

# Check registry
with open('data/layers/registry.json') as f:
    registry = json.load(f)
print('\nLayers in registry:')
for layer_id, layer in registry['layers'].items():
    print(f'  {layer_id}: {layer["name"]} ({layer["feature_count"]} features)')

Step 7: Test API Endpoints

# Start the API server
uvicorn api.main:app --reload

# Test endpoints
curl http://localhost:8000/api/v1/layers
curl http://localhost:8000/api/v1/layers/michigan-state-senate
curl http://localhost:8000/api/v1/layers/michigan-state-senate/features
curl http://localhost:8000/api/v1/layers/michigan-state-senate/geojson

Layer ID Naming Convention

Use kebab-case with this pattern:

{state-or-scope}-{layer-type}

Examples: - federal-congressional - U.S. Congressional Districts - michigan-state-senate - Michigan State Senate - michigan-state-house - Michigan State House - ohio-state-senate - Ohio State Senate - michigan-school-districts - Michigan School Districts

Output Structure

After import, files are created as follows:

data/layers/
├── registry.json                    # Updated with new layer
├── federal/
│   └── congressional_119.duckdb
└── mi/                              # State code, lowercase
    ├── house.duckdb
    └── senate.duckdb                # New layer database

Database Schema

Each layer database has this structure:

-- Features table
CREATE TABLE features (
    id VARCHAR PRIMARY KEY,     -- From --id-field
    name VARCHAR,               -- From --name-field
    geometry GEOMETRY,          -- Spatial geometry
    properties JSON             -- All other attributes
);

-- Spatial index for fast bbox queries
CREATE INDEX idx_features_geom ON features USING RTREE (geometry);

-- Metadata table
CREATE TABLE _metadata (
    key VARCHAR PRIMARY KEY,
    value JSON
);

Party Normalization

The import script automatically normalizes party values:

Input Output
R Republican
D Democrat
Republican Republican
Democrat Democrat
Democratic Democrat

Adding Legislator/Party Data

Census TIGER/Line files don't include legislator names or party affiliations. To enrich the data:

Option 1: Use State GIS Portal Data

Many state GIS portals include legislator information. Example from Michigan:

# Michigan State House with legislator data
wget "https://gis-michigan.opendata.arcgis.com/.../Michigan_State_House_Districts.geojson"

Option 2: Manual Enrichment

Create a CSV with district ID and metadata, then join:

district_id,legislator,party
001,John Smith,Democrat
002,Jane Doe,Republican
import pandas as pd
import json

# Load GeoJSON
with open('districts.geojson') as f:
    geojson = json.load(f)

# Load enrichment data
enrichment = pd.read_csv('enrichment.csv')
enrichment_dict = enrichment.set_index('district_id').to_dict('index')

# Merge
for feature in geojson['features']:
    district_id = feature['properties']['SLDUST']
    if district_id in enrichment_dict:
        feature['properties'].update(enrichment_dict[district_id])

# Save
with open('districts_enriched.geojson', 'w') as f:
    json.dump(geojson, f)

Common Issues

Issue: "No features found in GeoJSON"

Cause: Empty or malformed GeoJSON file.

Solution: Verify the file structure:

python3 -c "import json; print(len(json.load(open('file.geojson'))['features']))"

Issue: Duplicate ID error

Cause: The --id-field has duplicate values.

Solution: Use a unique field or combine fields:

# In conversion script
feature_id = f"{row['STATEFP']}-{row['SLDUST']}"

Issue: Geometry not displaying

Cause: Invalid or empty geometries.

Solution: Check for null geometries:

SELECT COUNT(*) FROM features WHERE geometry IS NULL;

Issue: Wrong coordinate system

Cause: Source data not in WGS84 (EPSG:4326).

Solution: Reproject during conversion:

# DuckDB can handle most projections automatically
# If needed, use ST_Transform:
conn.execute("""
    SELECT ST_Transform(geom, 'EPSG:4326', 'EPSG:4326') as geom
    FROM layer
""")

Complete Example: Adding Ohio State Senate

# 1. Download
cd state-data
mkdir -p ohio && cd ohio
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip
unzip cb_2024_39_sldu_500k.zip

# 2. Convert to GeoJSON
source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
import json

conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_39_sldu_500k.shp')")

features = []
for row in conn.execute("""
    SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
           ST_AsGeoJSON(geom) as geom_json FROM layer
""").fetchall():
    features.append({
        "type": "Feature",
        "properties": {
            "SLDUST": row[0], "NAMELSAD": row[1], "NAME": row[2],
            "STATEFP": row[3], "GEOID": row[4], "ALAND": row[5], "AWATER": row[6]
        },
        "geometry": json.loads(row[7])
    })

with open('ohio_state_senate.geojson', 'w') as f:
    json.dump({"type": "FeatureCollection", "features": features}, f)
print(f"Exported {len(features)} features")
EOF

# 3. Import
cd ../..
python scripts/import_layer.py \
  --source state-data/ohio/ohio_state_senate.geojson \
  --layer-id ohio-state-senate \
  --name "Ohio State Senate Districts" \
  --description "Ohio State Senate districts from 2024 Census TIGER/Line SLDU boundaries" \
  --id-field SLDUST \
  --name-field NAMELSAD \
  --scope state \
  --state OH \
  --source-name "U.S. Census Bureau TIGER/Line" \
  --source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip"

# 4. Verify
python3 -c "
import duckdb
conn = duckdb.connect('data/layers/oh/senate.duckdb', read_only=True)
print(f'Ohio Senate districts: {conn.execute(\"SELECT COUNT(*) FROM features\").fetchone()[0]}')
"

Checklist

  • Data source identified and downloaded
  • Shapefile inspected for field names
  • Converted to GeoJSON (if needed)
  • Correct --id-field and --name-field selected
  • Layer imported with import_layer.py
  • Feature count verified
  • Registry updated
  • API endpoints tested
  • Commit changes