Layer Onboarding Guide¶
Step-by-step instructions for adding new polygon layers to the cbdistricts platform.
Quick Reference¶
# Standard import command
python scripts/import_layer.py \
--source <path-to-geojson> \
--layer-id <unique-layer-id> \
--name "<Display Name>" \
--id-field <ID_COLUMN> \
--name-field <NAME_COLUMN> \
--scope state \
--state <STATE_CODE> \
--source-url "<data-source-url>"
Data Sources¶
Census Bureau TIGER/Line (Recommended)¶
The Census Bureau provides authoritative boundary files for all U.S. political districts.
Base URL: https://www2.census.gov/geo/tiger/
| Layer Type | Path Pattern | Example |
|---|---|---|
| Congressional Districts | GENZ{YEAR}/shp/cb_{YEAR}_us_cd{CONGRESS}_500k.zip |
cb_2024_us_cd119_500k.zip |
| State Senate (SLDU) | GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldu_500k.zip |
cb_2024_26_sldu_500k.zip |
| State House (SLDL) | GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_sldl_500k.zip |
cb_2024_26_sldl_500k.zip |
| Counties | GENZ{YEAR}/shp/cb_{YEAR}_us_county_500k.zip |
cb_2024_us_county_500k.zip |
| School Districts | GENZ{YEAR}/shp/cb_{YEAR}_{FIPS}_unsd_500k.zip |
cb_2024_26_unsd_500k.zip |
State FIPS Codes: Full list - Michigan: 26 - Ohio: 39 - Pennsylvania: 42 - Iowa: 19 - Kentucky: 21
State GIS Portals¶
Many states maintain their own GIS data with richer attributes (legislator names, party, etc.).
| State | Portal | Notes |
|---|---|---|
| Michigan | https://gis-michigan.opendata.arcgis.com | Has legislator/party data |
| Ohio | https://geohio.ohio.gov | |
| Pennsylvania | https://www.pasda.psu.edu |
Other Sources¶
- Redistricting Data Hub: https://redistrictingdatahub.org (requires free account)
- VEST/Harvard Dataverse: Election + precinct data with shapefiles
Supported Data Formats¶
| Format | Extension | Notes |
|---|---|---|
| GeoJSON | .geojson |
Direct import, preferred |
| Shapefile | .shp |
Requires conversion to GeoJSON |
| GeoPackage | .gpkg |
Requires conversion |
| KML | .kml |
Requires conversion |
Step-by-Step Import Process¶
Step 1: Identify the Data Source¶
Determine what layer you need and find the authoritative source.
Example: Michigan State Senate
- Layer type: State Legislative Districts Upper (SLDU)
- State FIPS: 26
- Census URL: https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip
Step 2: Download the Data¶
# Create state data directory
mkdir -p state-data/{state-code}
cd state-data/{state-code}
# Download from Census
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip
unzip cb_2024_26_sldu_500k.zip
Step 3: Inspect the Data¶
Use DuckDB to examine the shapefile fields:
source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_26_sldu_500k.shp')")
# Show schema
print("SCHEMA:")
print(conn.execute('DESCRIBE layer').df())
# Show sample data
print("\nSAMPLE DATA:")
print(conn.execute('SELECT * FROM layer LIMIT 5').df())
EOF
Common Census TIGER/Line Fields:
| Field | Description | Use As |
|---|---|---|
GEOID |
Geographic identifier | id_field |
NAME |
Short name (e.g., "1") | |
NAMELSAD |
Full name (e.g., "State Senate District 1") | name_field |
SLDUST |
Senate district number | id_field alternative |
SLDLST |
House district number | id_field alternative |
STATEFP |
State FIPS code | Property |
ALAND |
Land area (sq meters) | Property |
AWATER |
Water area (sq meters) | Property |
Step 4: Convert to GeoJSON¶
If your source is a shapefile, convert it to GeoJSON:
import duckdb
import json
conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('your_file.shp')")
# Build GeoJSON
features = []
rows = conn.execute("""
SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
ST_AsGeoJSON(geom) as geom_json
FROM layer
""").fetchall()
for row in rows:
feature = {
"type": "Feature",
"properties": {
"SLDUST": row[0],
"NAMELSAD": row[1],
"NAME": row[2],
"STATEFP": row[3],
"GEOID": row[4],
"ALAND": row[5],
"AWATER": row[6]
},
"geometry": json.loads(row[7])
}
features.append(feature)
geojson = {"type": "FeatureCollection", "features": features}
with open('output.geojson', 'w') as f:
json.dump(geojson, f)
Step 5: Import the Layer¶
Run the import script from the project root:
cd /path/to/cbdistricts
source ~/.pyenv/versions/nominates/bin/activate
python scripts/import_layer.py \
--source state-data/michigan/michigan_state_senate.geojson \
--layer-id michigan-state-senate \
--name "Michigan State Senate Districts" \
--description "Michigan State Senate districts (38 total) from 2024 Census TIGER/Line SLDU boundaries" \
--id-field SLDUST \
--name-field NAMELSAD \
--scope state \
--state MI \
--source-name "U.S. Census Bureau TIGER/Line" \
--source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_26_sldu_500k.zip"
Import Script Parameters:
| Parameter | Required | Description |
|---|---|---|
--source |
Yes | Path to GeoJSON file |
--layer-id |
Yes | Unique identifier (e.g., michigan-state-senate) |
--name |
Yes | Human-readable name |
--description |
No | Layer description |
--id-field |
Yes | Property to use as feature ID |
--name-field |
No | Property to use as display name (defaults to id-field) |
--scope |
No | federal, state, or local (default: state) |
--state |
No | State code for state/local layers (e.g., MI) |
--source-name |
No | Data source name |
--source-url |
No | Data source URL |
Step 6: Verify the Import¶
import duckdb
import json
# Check the database
conn = duckdb.connect('data/layers/mi/senate.duckdb', read_only=True)
conn.execute('LOAD spatial')
# Count features
count = conn.execute('SELECT COUNT(*) FROM features').fetchone()[0]
print(f'Total features: {count}')
# Sample features
print('\nSample features:')
rows = conn.execute('SELECT id, name FROM features ORDER BY id LIMIT 10').fetchall()
for row in rows:
print(f' {row[0]}: {row[1]}')
# Check registry
with open('data/layers/registry.json') as f:
registry = json.load(f)
print('\nLayers in registry:')
for layer_id, layer in registry['layers'].items():
print(f' {layer_id}: {layer["name"]} ({layer["feature_count"]} features)')
Step 7: Test API Endpoints¶
# Start the API server
uvicorn api.main:app --reload
# Test endpoints
curl http://localhost:8000/api/v1/layers
curl http://localhost:8000/api/v1/layers/michigan-state-senate
curl http://localhost:8000/api/v1/layers/michigan-state-senate/features
curl http://localhost:8000/api/v1/layers/michigan-state-senate/geojson
Layer ID Naming Convention¶
Use kebab-case with this pattern:
Examples:
- federal-congressional - U.S. Congressional Districts
- michigan-state-senate - Michigan State Senate
- michigan-state-house - Michigan State House
- ohio-state-senate - Ohio State Senate
- michigan-school-districts - Michigan School Districts
Output Structure¶
After import, files are created as follows:
data/layers/
├── registry.json # Updated with new layer
├── federal/
│ └── congressional_119.duckdb
└── mi/ # State code, lowercase
├── house.duckdb
└── senate.duckdb # New layer database
Database Schema¶
Each layer database has this structure:
-- Features table
CREATE TABLE features (
id VARCHAR PRIMARY KEY, -- From --id-field
name VARCHAR, -- From --name-field
geometry GEOMETRY, -- Spatial geometry
properties JSON -- All other attributes
);
-- Spatial index for fast bbox queries
CREATE INDEX idx_features_geom ON features USING RTREE (geometry);
-- Metadata table
CREATE TABLE _metadata (
key VARCHAR PRIMARY KEY,
value JSON
);
Party Normalization¶
The import script automatically normalizes party values:
| Input | Output |
|---|---|
R |
Republican |
D |
Democrat |
Republican |
Republican |
Democrat |
Democrat |
Democratic |
Democrat |
Adding Legislator/Party Data¶
Census TIGER/Line files don't include legislator names or party affiliations. To enrich the data:
Option 1: Use State GIS Portal Data¶
Many state GIS portals include legislator information. Example from Michigan:
# Michigan State House with legislator data
wget "https://gis-michigan.opendata.arcgis.com/.../Michigan_State_House_Districts.geojson"
Option 2: Manual Enrichment¶
Create a CSV with district ID and metadata, then join:
import pandas as pd
import json
# Load GeoJSON
with open('districts.geojson') as f:
geojson = json.load(f)
# Load enrichment data
enrichment = pd.read_csv('enrichment.csv')
enrichment_dict = enrichment.set_index('district_id').to_dict('index')
# Merge
for feature in geojson['features']:
district_id = feature['properties']['SLDUST']
if district_id in enrichment_dict:
feature['properties'].update(enrichment_dict[district_id])
# Save
with open('districts_enriched.geojson', 'w') as f:
json.dump(geojson, f)
Common Issues¶
Issue: "No features found in GeoJSON"¶
Cause: Empty or malformed GeoJSON file.
Solution: Verify the file structure:
Issue: Duplicate ID error¶
Cause: The --id-field has duplicate values.
Solution: Use a unique field or combine fields:
Issue: Geometry not displaying¶
Cause: Invalid or empty geometries.
Solution: Check for null geometries:
Issue: Wrong coordinate system¶
Cause: Source data not in WGS84 (EPSG:4326).
Solution: Reproject during conversion:
# DuckDB can handle most projections automatically
# If needed, use ST_Transform:
conn.execute("""
SELECT ST_Transform(geom, 'EPSG:4326', 'EPSG:4326') as geom
FROM layer
""")
Complete Example: Adding Ohio State Senate¶
# 1. Download
cd state-data
mkdir -p ohio && cd ohio
wget https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip
unzip cb_2024_39_sldu_500k.zip
# 2. Convert to GeoJSON
source ~/.pyenv/versions/nominates/bin/activate
python3 << 'EOF'
import duckdb
import json
conn = duckdb.connect()
conn.execute('INSTALL spatial; LOAD spatial;')
conn.execute("CREATE TABLE layer AS SELECT * FROM ST_Read('cb_2024_39_sldu_500k.shp')")
features = []
for row in conn.execute("""
SELECT SLDUST, NAMELSAD, NAME, STATEFP, GEOID, ALAND, AWATER,
ST_AsGeoJSON(geom) as geom_json FROM layer
""").fetchall():
features.append({
"type": "Feature",
"properties": {
"SLDUST": row[0], "NAMELSAD": row[1], "NAME": row[2],
"STATEFP": row[3], "GEOID": row[4], "ALAND": row[5], "AWATER": row[6]
},
"geometry": json.loads(row[7])
})
with open('ohio_state_senate.geojson', 'w') as f:
json.dump({"type": "FeatureCollection", "features": features}, f)
print(f"Exported {len(features)} features")
EOF
# 3. Import
cd ../..
python scripts/import_layer.py \
--source state-data/ohio/ohio_state_senate.geojson \
--layer-id ohio-state-senate \
--name "Ohio State Senate Districts" \
--description "Ohio State Senate districts from 2024 Census TIGER/Line SLDU boundaries" \
--id-field SLDUST \
--name-field NAMELSAD \
--scope state \
--state OH \
--source-name "U.S. Census Bureau TIGER/Line" \
--source-url "https://www2.census.gov/geo/tiger/GENZ2024/shp/cb_2024_39_sldu_500k.zip"
# 4. Verify
python3 -c "
import duckdb
conn = duckdb.connect('data/layers/oh/senate.duckdb', read_only=True)
print(f'Ohio Senate districts: {conn.execute(\"SELECT COUNT(*) FROM features\").fetchone()[0]}')
"
Checklist¶
- Data source identified and downloaded
- Shapefile inspected for field names
- Converted to GeoJSON (if needed)
- Correct
--id-fieldand--name-fieldselected - Layer imported with
import_layer.py - Feature count verified
- Registry updated
- API endpoints tested
- Commit changes