Skip to content

Performance Architecture Plan

Current Pain Points

  1. Wikipedia API - 5-7 sec/request on cold cache
  2. No Redis caching - Every request hits DuckDB
  3. No pre-built assets - Polygons generated per-request
  4. No CDN/static serving - All data through FastAPI

Proposed Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                           CLIENT                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────────┐ │
│  │ Static CDN  │  │  API calls  │  │  Pre-built Bundle (gzip)    │ │
│  │ /bundles/*  │  │  /api/v1/*  │  │  Download once, cache local │ │
│  └──────┬──────┘  └──────┬──────┘  └──────────────┬──────────────┘ │
└─────────┼────────────────┼────────────────────────┼─────────────────┘
          │                │                        │
          ▼                ▼                        ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         NGINX                                        │
│  ┌─────────────────┐  ┌─────────────────────────────────────────┐  │
│  │ Static Files    │  │ Proxy to FastAPI                        │  │
│  │ /bundles/*.gz   │  │ /api/v1/* → localhost:32406             │  │
│  └─────────────────┘  └─────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│                      FastAPI (32406)                                 │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    Redis Cache Layer                         │   │
│  │   Key: {data_class}:{geoid}:{endpoint}                      │   │
│  │   TTL: 7 days (static), 1 day (wikipedia)                   │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                               │                                      │
│              ┌────────────────┼────────────────┐                    │
│              ▼                ▼                ▼                    │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐          │
│  │   DuckDB      │  │  File Cache   │  │  External API │          │
│  │  (polygons,   │  │  (wikipedia)  │  │  (refresh)    │          │
│  │   census)     │  │               │  │               │          │
│  └───────────────┘  └───────────────┘  └───────────────┘          │
└─────────────────────────────────────────────────────────────────────┘

Data Classes Architecture

Design for multiple polygon types with associated metadata:

data/
├── releases/
│   └── v1.0.0/
│       ├── manifest.json              # Version, checksums, sizes
│       ├── congressional_districts/
│       │   ├── all.geojson.gz         # Full bundle (polygons + all metadata)
│       │   ├── polygons.geojson.gz    # Just boundaries
│       │   ├── census.json.gz         # Demographics only
│       │   ├── wikipedia.json.gz      # Rep, party, PVI
│       │   └── by_state/
│       │       ├── 06.geojson.gz      # California bundle
│       │       └── ...
│       ├── state_legislative/         # Future
│       │   ├── upper/
│       │   └── lower/
│       └── county/                    # Future
├── cache/
│   ├── redis/                         # Redis persistence (optional)
│   └── wikipedia/                     # Current file cache
└── output/
    └── cbdistricts.duckdb             # Source of truth

Implementation Phases

Phase 1: Redis Caching Layer (Quick Win)

Add Redis caching to all endpoints:

# api/cache/redis_cache.py
import redis
import json
from functools import wraps

redis_client = redis.Redis(host='localhost', port=6379, db=0)

CACHE_TTLS = {
    'polygons': 7 * 24 * 3600,      # 7 days
    'census': 7 * 24 * 3600,        # 7 days
    'wikipedia': 24 * 3600,          # 1 day
    'default': 3600,                 # 1 hour
}

def cache_response(data_class: str, ttl_key: str = 'default'):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            # Build cache key from function name and args
            cache_key = f"{data_class}:{func.__name__}:{hash(str(args) + str(kwargs))}"

            # Check cache
            cached = redis_client.get(cache_key)
            if cached:
                return json.loads(cached)

            # Execute and cache
            result = await func(*args, **kwargs)
            redis_client.setex(
                cache_key,
                CACHE_TTLS.get(ttl_key, CACHE_TTLS['default']),
                json.dumps(result)
            )
            return result
        return wrapper
    return decorator

Estimated impact: 10-100x speedup on repeat requests

Phase 2: Pre-built Asset Bundles

Build pipeline to generate static assets:

# scripts/build_release.py

def build_release(version: str):
    """Build a complete release bundle."""
    release_dir = DATA_DIR / "releases" / version
    release_dir.mkdir(parents=True, exist_ok=True)

    # 1. Export full GeoJSON bundle with all metadata
    build_full_bundle(release_dir / "congressional_districts")

    # 2. Build per-state bundles
    build_state_bundles(release_dir / "congressional_districts" / "by_state")

    # 3. Build manifest
    manifest = {
        "version": version,
        "built_at": datetime.utcnow().isoformat(),
        "data_classes": {
            "congressional_districts": {
                "count": 441,
                "files": {
                    "all.geojson.gz": {"size": ..., "checksum": ...},
                    "polygons.geojson.gz": {"size": ..., "checksum": ...},
                    ...
                }
            }
        }
    }

    # 4. Gzip everything
    gzip_directory(release_dir)

    return manifest

Client usage:

// Download once, cache in IndexedDB
const bundle = await fetch('/bundles/v1.0.0/congressional_districts/all.geojson.gz');
const data = await decompress(bundle);
localStorage.setItem('districts_v1.0.0', data);

Estimated impact: Instant load after first download (~2-5MB gzipped)

Phase 3: Background Refresh Jobs

Scheduled jobs to keep cache warm:

# api/jobs/cache_warmer.py

async def warm_wikipedia_cache():
    """Run daily to refresh Wikipedia data."""
    districts = get_all_geoids()

    for geoid in districts:
        # Check if cache is expiring soon
        cache_key = f"congressional_districts:wikipedia:{geoid}"
        ttl = redis_client.ttl(cache_key)

        if ttl < 12 * 3600:  # Less than 12 hours left
            await fetch_and_cache_wikipedia(geoid)
            await asyncio.sleep(1)  # Rate limit

# Run via systemd timer or cron
# 0 3 * * * /path/to/warm_cache.py

Phase 4: Multi-Class Data Support

Abstract the data model:

# api/models/data_class.py

class DataClass(BaseModel):
    """Base class for all geographic data types."""
    id: str                           # Unique identifier (GEOID, etc.)
    name: str                         # Human-readable name
    geometry: Optional[dict]          # GeoJSON geometry
    metadata: dict                    # Class-specific metadata

class CongressionalDistrict(DataClass):
    state_fips: str
    district_number: int
    census: Optional[CensusData]
    wikipedia: Optional[WikipediaData]

class StateLegislativeDistrict(DataClass):
    state_fips: str
    chamber: str                      # "upper" or "lower"
    district_number: str

class County(DataClass):
    state_fips: str
    county_fips: str

Redis Key Schema

# Pattern: {data_class}:{data_type}:{identifier}

congressional_districts:polygon:1903
congressional_districts:census:1903
congressional_districts:wikipedia:1903
congressional_districts:full:1903           # All data combined
congressional_districts:list:state:19       # All districts in Iowa
congressional_districts:geojson:all         # Full GeoJSON (large)

state_legislative:polygon:19:upper:01
state_legislative:census:19:upper:01

# Metadata
_meta:releases:current                      # Current release version
_meta:releases:v1.0.0:manifest             # Release manifest
_meta:cache_stats                          # Hit/miss counters

API Changes

New Endpoints

GET  /api/v1/releases                      # List available releases
GET  /api/v1/releases/{version}/manifest   # Get release manifest
GET  /api/v1/releases/{version}/{class}/bundle.geojson.gz  # Download bundle

POST /api/v1/admin/cache/warm              # Warm all caches
POST /api/v1/admin/cache/clear             # Clear cache
GET  /api/v1/admin/cache/stats             # Cache statistics

GET  /api/v1/{data_class}                  # Generic list endpoint
GET  /api/v1/{data_class}/{id}             # Generic detail endpoint
GET  /api/v1/{data_class}/{id}/full        # All data combined

Response Headers

X-Cache: HIT|MISS
X-Cache-TTL: 604800
X-Data-Version: v1.0.0

Performance Targets

Endpoint Current With Redis With Bundles
List all districts ~200ms ~10ms N/A (client-side)
Single district ~50ms ~5ms N/A (client-side)
Wikipedia data 5-7sec (cold) ~5ms (warm) ~1ms (pre-built)
GeoJSON bundle ~500ms ~50ms Instant (static file)

Refresh Strategy

Data Type Refresh Frequency Method
Polygons Yearly (redistricting) Manual release
Census Yearly (ACS release) Manual release
Wikipedia Daily Background job
Elections As needed Manual release

Implementation Priority

  1. Week 1: Redis caching layer for all endpoints
  2. Week 2: Pre-built bundle generation + nginx serving
  3. Week 3: Background cache warming jobs
  4. Week 4: Multi-class data abstraction

Dependencies to Add

redis>=5.0.0
aioredis>=2.0.0  # If using async Redis
apscheduler>=3.10.0  # For background jobs

Files to Create

api/
├── cache/
│   ├── __init__.py
│   ├── redis_client.py      # Redis connection
│   ├── decorators.py        # @cache_response decorator
│   └── keys.py              # Key schema constants
├── jobs/
│   ├── __init__.py
│   ├── cache_warmer.py      # Background refresh
│   └── scheduler.py         # Job scheduling
scripts/
├── build_release.py         # Generate release bundles
└── warm_cache.py            # CLI cache warmer