Layer Onboarding Guide¶

Step-by-step instructions for adding new polygon layers to the cbdistricts platform.

Quick Reference¶

# Standard import command
python scripts/import_layer.py \
  --source <path-to-geojson> \
  --layer-id <unique-layer-id> \
  --name "<Display Name>" \
  --id-field <ID_COLUMN> \
  --name-field <NAME_COLUMN> \
  --scope state \
  --state <STATE_CODE> \
  --source-url "<data-source-url>"

Data Sources¶

Census Bureau TIGER/Line (Recommended)¶

The Census Bureau provides authoritative boundary files for all U.S. political districts.

Base URL: https://www2.census.gov/geo/tiger/

Layer Type	Path Pattern	Example
Congressional Districts	`GENZ{YEAR}/shp/cb_{YEAR}_us_cd{CONGRESS}_500k.zip`	`cb_2024_us_cd119_500k.zip`
State Senate (SLDU)	`GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldu_500k.zip`	`cb_2024_26_sldu_500k.zip`
State House (SLDL)	`GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldl_500k.zip`	`cb_2024_26_sldl_500k.zip`
Counties	`GENZ{YEAR}/shp/cb_{YEAR}_us_county_500k.zip`	`cb_2024_us_county_500k.zip`
School Districts	`GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_unsd_500k.zip`	`cb_2024_26_unsd_500k.zip`

State FIPS Codes: Full list - Michigan: 26 - Ohio: 39 - Pennsylvania: 42 - Iowa: 19 - Kentucky: 21

State GIS Portals¶

Many states maintain their own GIS data with richer attributes (legislator names, party, etc.).

State	Portal	Notes
Michigan	https://gis-michigan.opendata.arcgis.com	Has legislator/party data
Ohio	https://geohio.ohio.gov
Pennsylvania	https://www.pasda.psu.edu

Other Sources¶

Redistricting Data Hub: https://redistrictingdatahub.org (requires free account)
VEST/Harvard Dataverse: Election + precinct data with shapefiles

Supported Data Formats¶

Format	Extension	Notes
GeoJSON	`.geojson`	Direct import, preferred
Shapefile	`.shp`	Requires conversion to GeoJSON
GeoPackage	`.gpkg`	Requires conversion
KML	`.kml`	Requires conversion

Step-by-Step Import Process¶

Step 1: Identify the Data Source¶

Determine what layer you need and find the authoritative source.

Example: Michigan State Senate - Layer type: State Legislative Districts Upper (SLDU) - State FIPS: 26 - Census URL: https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip

Step 2: Download the Data¶

# Create state data directory
mkdir -p state-data/{state-code}
cd state-data/{state-code}

# Download from Census
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip
unzip cb_2024_26_sldu_500k.zip

Step 3: Inspect the Data¶

Use DuckDB to examine the shapefile fields:

source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_26_sldu_500k.shp')")

# Show schema
print("SCHEMA:")
print(conn.execute('DESCRIBE layer').df())

# Show sample data
print("\nSAMPLE DATA:")
print(conn.execute('SELECT * FROM layer LIMIT 5').df())
EOF

Common Census TIGER/Line Fields:

Field	Description	Use As
`GEOID`	Geographic identifier	`id_field`
`NAME`	Short name (e.g., "1")
`NAMELSAD`	Full name (e.g., "State Senate District 1")	`name_field`
`SLDUST`	Senate district number	`id_field` alternative
`SLDLST`	House district number	`id_field` alternative
`STATEFP`	State FIPS code	Property
`ALAND`	Land area (sq meters)	Property
`AWATER`	Water area (sq meters)	Property

Step 4: Convert to GeoJSON¶

If your source is a shapefile, convert it to GeoJSON:

import duckdb
import json

conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('your_file.shp')")

# Build GeoJSON
features = []
rows = conn.execute("""
    SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
           ST_AsGeoJSON(geom) as geom_json
    FROM layer
""").fetchall()

for row in rows:
    feature = {
        "type": "Feature",
        "properties": {
            "SLDUST": row[0],
            "NAMELSAD": row[1],
            "NAME": row[2],
            "STATEFP": row[3],
            "GEOID": row[4],
            "ALAND": row[5],
            "AWATER": row[6]
        },
        "geometry": json.loads(row[7])
    }
    features.append(feature)

geojson = {"type": "FeatureCollection", "features": features}
with open('output.geojson', 'w') as f:
    json.dump(geojson, f)

Step 5: Import the Layer¶

Run the import script from the project root:

cd /path/to/cbdistricts
source ~/.pyenv/versions/nominates/bin/activate

python scripts/import_layer.py \
  --source state-data/michigan/michigan_state_senate.geojson \
  --layer-id michigan-state-senate \
  --name "Michigan State Senate Districts" \
  --description "Michigan State Senate districts (38 total) from 2024 Census TIGER/Line SLDU boundaries" \
  --id-field SLDUST \
  --name-field NAMELSAD \
  --scope state \
  --state MI \
  --source-name "U.S. Census Bureau TIGER/Line" \
  --source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip"

Import Script Parameters:

Parameter	Required	Description
`--source`	Yes	Path to GeoJSON file
`--layer-id`	Yes	Unique identifier (e.g., `michigan-state-senate`)
`--name`	Yes	Human-readable name
`--description`	No	Layer description
`--id-field`	Yes	Property to use as feature ID
`--name-field`	No	Property to use as display name (defaults to id-field)
`--scope`	No	`federal`, `state`, or `local` (default: `state`)
`--state`	No	State code for state/local layers (e.g., `MI`)
`--source-name`	No	Data source name
`--source-url`	No	Data source URL

Step 6: Verify the Import¶

import duckdb
import json

# Check the database
conn = duckdb.connect('data/layers/mi/senate.duckdb', read_only=True)
conn.execute('LOAD spatial')

# Count features
count = conn.execute('SELECT COUNT(*) FROM features').fetchone()[0]
print(f'Total features: {count}')

# Sample features
print('\nSample features:')
rows = conn.execute('SELECT id, name FROM features ORDER BY id LIMIT 10').fetchall()
for row in rows:
    print(f'  {row[0]}: {row[1]}')

# Check registry
with open('data/layers/registry.json') as f:
    registry = json.load(f)
print('\nLayers in registry:')
for layer_id, layer in registry['layers'].items():
    print(f'  {layer_id}: {layer["name"]} ({layer["feature_count"]} features)')

Step 7: Test API Endpoints¶

# Start the API server
uvicorn api.main:app --reload

# Test endpoints
curl http://localhost:8000/api/v1/layers
curl http://localhost:8000/api/v1/layers/michigan-state-senate
curl http://localhost:8000/api/v1/layers/michigan-state-senate/features
curl http://localhost:8000/api/v1/layers/michigan-state-senate/geojson

Layer ID Naming Convention¶

Use kebab-case with this pattern:

{state-or-scope}-{layer-type}

Examples: - federal-congressional - U.S. Congressional Districts - michigan-state-senate - Michigan State Senate - michigan-state-house - Michigan State House - ohio-state-senate - Ohio State Senate - michigan-school-districts - Michigan School Districts

Output Structure¶

After import, files are created as follows:

data/layers/
├── registry.json                    # Updated with new layer
├── federal/
│   └── congressional_119.duckdb
└── mi/                              # State code, lowercase
    ├── house.duckdb
    └── senate.duckdb                # New layer database

Database Schema¶

Each layer database has this structure:

-- Features table
CREATE TABLE features (
    id VARCHAR PRIMARY KEY,     -- From --id-field
    name VARCHAR,               -- From --name-field
    geometry GEOMETRY,          -- Spatial geometry
    properties JSON             -- All other attributes
);

-- Spatial index for fast bbox queries
CREATE INDEX idx_features_geom ON features USING RTREE (geometry);

-- Metadata table
CREATE TABLE _metadata (
    key VARCHAR PRIMARY KEY,
    value JSON
);

Party Normalization¶

The import script automatically normalizes party values:

Input	Output
`R`	`Republican`
`D`	`Democrat`
`Republican`	`Republican`
`Democrat`	`Democrat`
`Democratic`	`Democrat`

Adding Legislator/Party Data¶

Census TIGER/Line files don't include legislator names or party affiliations. To enrich the data:

Option 1: Use State GIS Portal Data¶

Many state GIS portals include legislator information. Example from Michigan:

# Michigan State House with legislator data
wget "https://gis-michigan.opendata.arcgis.com/.../Michigan_State_House_Districts.geojson"

Option 2: Manual Enrichment¶

Create a CSV with district ID and metadata, then join:

district_id,legislator,party
001,John Smith,Democrat
002,Jane Doe,Republican

import pandas as pd
import json

# Load GeoJSON
with open('districts.geojson') as f:
    geojson = json.load(f)

# Load enrichment data
enrichment = pd.read_csv('enrichment.csv')
enrichment_dict = enrichment.set_index('district_id').to_dict('index')

# Merge
for feature in geojson['features']:
    district_id = feature['properties']['SLDUST']
    if district_id in enrichment_dict:
        feature['properties'].update(enrichment_dict[district_id])

# Save
with open('districts_enriched.geojson', 'w') as f:
    json.dump(geojson, f)

Common Issues¶

Issue: "No features found in GeoJSON"¶

Cause: Empty or malformed GeoJSON file.

Solution: Verify the file structure:

python3 -c "import json; print(len(json.load(open('file.geojson'))['features']))"

Issue: Duplicate ID error¶

Cause: The --id-field has duplicate values.

Solution: Use a unique field or combine fields:

# In conversion script
feature_id = f"{row['STATEFP']}-{row['SLDUST']}"

Issue: Geometry not displaying¶

Cause: Invalid or empty geometries.

Solution: Check for null geometries:

SELECT COUNT(*) FROM features WHERE geometry IS NULL;

Issue: Wrong coordinate system¶

Cause: Source data not in WGS84 (EPSG:4326).

Solution: Reproject during conversion:

# DuckDB can handle most projections automatically
# If needed, use ST_Transform:
conn.execute("""
    SELECT ST_Transform(geom, 'EPSG:4326', 'EPSG:4326') as geom
    FROM layer
""")

Complete Example: Adding Ohio State Senate¶

# 1. Download
cd state-data
mkdir -p ohio && cd ohio
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip
unzip cb_2024_39_sldu_500k.zip

# 2. Convert to GeoJSON
source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
import json

conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_39_sldu_500k.shp')")

features = []
for row in conn.execute("""
    SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
           ST_AsGeoJSON(geom) as geom_json FROM layer
""").fetchall():
    features.append({
        "type": "Feature",
        "properties": {
            "SLDUST": row[0], "NAMELSAD": row[1], "NAME": row[2],
            "STATEFP": row[3], "GEOID": row[4], "ALAND": row[5], "AWATER": row[6]
        },
        "geometry": json.loads(row[7])
    })

with open('ohio_state_senate.geojson', 'w') as f:
    json.dump({"type": "FeatureCollection", "features": features}, f)
print(f"Exported {len(features)} features")
EOF

# 3. Import
cd ../..
python scripts/import_layer.py \
  --source state-data/ohio/ohio_state_senate.geojson \
  --layer-id ohio-state-senate \
  --name "Ohio State Senate Districts" \
  --description "Ohio State Senate districts from 2024 Census TIGER/Line SLDU boundaries" \
  --id-field SLDUST \
  --name-field NAMELSAD \
  --scope state \
  --state OH \
  --source-name "U.S. Census Bureau TIGER/Line" \
  --source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip"

# 4. Verify
python3 -c "
import duckdb
conn = duckdb.connect('data/layers/oh/senate.duckdb', read_only=True)
print(f'Ohio Senate districts: {conn.execute(\"SELECT COUNT(*) FROM features\").fetchone()[0]}')
"

Checklist¶

Data source identified and downloaded
Shapefile inspected for field names
Converted to GeoJSON (if needed)
Correct --id-field and --name-field selected
Layer imported with import_layer.py
Feature count verified
Registry updated
API endpoints tested
Commit changes