Performance Architecture Plan¶
Current Pain Points¶
- Wikipedia API - 5-7 sec/request on cold cache
- No Redis caching - Every request hits DuckDB
- No pre-built assets - Polygons generated per-request
- No CDN/static serving - All data through FastAPI
Proposed Architecture¶
┌─────────────────────────────────────────────────────────────────────┐
│ CLIENT │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ Static CDN │ │ API calls │ │ Pre-built Bundle (gzip) │ │
│ │ /bundles/* │ │ /api/v1/* │ │ Download once, cache local │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────────┬──────────────┘ │
└─────────┼────────────────┼────────────────────────┼─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ NGINX │
│ ┌─────────────────┐ ┌─────────────────────────────────────────┐ │
│ │ Static Files │ │ Proxy to FastAPI │ │
│ │ /bundles/*.gz │ │ /api/v1/* → localhost:32406 │ │
│ └─────────────────┘ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ FastAPI (32406) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Redis Cache Layer │ │
│ │ Key: {data_class}:{geoid}:{endpoint} │ │
│ │ TTL: 7 days (static), 1 day (wikipedia) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ DuckDB │ │ File Cache │ │ External API │ │
│ │ (polygons, │ │ (wikipedia) │ │ (refresh) │ │
│ │ census) │ │ │ │ │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Data Classes Architecture¶
Design for multiple polygon types with associated metadata:
data/
├── releases/
│ └── v1.0.0/
│ ├── manifest.json # Version, checksums, sizes
│ ├── congressional_districts/
│ │ ├── all.geojson.gz # Full bundle (polygons + all metadata)
│ │ ├── polygons.geojson.gz # Just boundaries
│ │ ├── census.json.gz # Demographics only
│ │ ├── wikipedia.json.gz # Rep, party, PVI
│ │ └── by_state/
│ │ ├── 06.geojson.gz # California bundle
│ │ └── ...
│ ├── state_legislative/ # Future
│ │ ├── upper/
│ │ └── lower/
│ └── county/ # Future
│
├── cache/
│ ├── redis/ # Redis persistence (optional)
│ └── wikipedia/ # Current file cache
│
└── output/
└── cbdistricts.duckdb # Source of truth
Implementation Phases¶
Phase 1: Redis Caching Layer (Quick Win)¶
Add Redis caching to all endpoints:
# api/cache/redis_cache.py
import redis
import json
from functools import wraps
redis_client = redis.Redis(host='localhost', port=6379, db=0)
CACHE_TTLS = {
'polygons': 7 * 24 * 3600, # 7 days
'census': 7 * 24 * 3600, # 7 days
'wikipedia': 24 * 3600, # 1 day
'default': 3600, # 1 hour
}
def cache_response(data_class: str, ttl_key: str = 'default'):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Build cache key from function name and args
cache_key = f"{data_class}:{func.__name__}:{hash(str(args) + str(kwargs))}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# Execute and cache
result = await func(*args, **kwargs)
redis_client.setex(
cache_key,
CACHE_TTLS.get(ttl_key, CACHE_TTLS['default']),
json.dumps(result)
)
return result
return wrapper
return decorator
Estimated impact: 10-100x speedup on repeat requests
Phase 2: Pre-built Asset Bundles¶
Build pipeline to generate static assets:
# scripts/build_release.py
def build_release(version: str):
"""Build a complete release bundle."""
release_dir = DATA_DIR / "releases" / version
release_dir.mkdir(parents=True, exist_ok=True)
# 1. Export full GeoJSON bundle with all metadata
build_full_bundle(release_dir / "congressional_districts")
# 2. Build per-state bundles
build_state_bundles(release_dir / "congressional_districts" / "by_state")
# 3. Build manifest
manifest = {
"version": version,
"built_at": datetime.utcnow().isoformat(),
"data_classes": {
"congressional_districts": {
"count": 441,
"files": {
"all.geojson.gz": {"size": ..., "checksum": ...},
"polygons.geojson.gz": {"size": ..., "checksum": ...},
...
}
}
}
}
# 4. Gzip everything
gzip_directory(release_dir)
return manifest
Client usage:
// Download once, cache in IndexedDB
const bundle = await fetch('/bundles/v1.0.0/congressional_districts/all.geojson.gz');
const data = await decompress(bundle);
localStorage.setItem('districts_v1.0.0', data);
Estimated impact: Instant load after first download (~2-5MB gzipped)
Phase 3: Background Refresh Jobs¶
Scheduled jobs to keep cache warm:
# api/jobs/cache_warmer.py
async def warm_wikipedia_cache():
"""Run daily to refresh Wikipedia data."""
districts = get_all_geoids()
for geoid in districts:
# Check if cache is expiring soon
cache_key = f"congressional_districts:wikipedia:{geoid}"
ttl = redis_client.ttl(cache_key)
if ttl < 12 * 3600: # Less than 12 hours left
await fetch_and_cache_wikipedia(geoid)
await asyncio.sleep(1) # Rate limit
# Run via systemd timer or cron
# 0 3 * * * /path/to/warm_cache.py
Phase 4: Multi-Class Data Support¶
Abstract the data model:
# api/models/data_class.py
class DataClass(BaseModel):
"""Base class for all geographic data types."""
id: str # Unique identifier (GEOID, etc.)
name: str # Human-readable name
geometry: Optional[dict] # GeoJSON geometry
metadata: dict # Class-specific metadata
class CongressionalDistrict(DataClass):
state_fips: str
district_number: int
census: Optional[CensusData]
wikipedia: Optional[WikipediaData]
class StateLegislativeDistrict(DataClass):
state_fips: str
chamber: str # "upper" or "lower"
district_number: str
class County(DataClass):
state_fips: str
county_fips: str
Redis Key Schema¶
# Pattern: {data_class}:{data_type}:{identifier}
congressional_districts:polygon:1903
congressional_districts:census:1903
congressional_districts:wikipedia:1903
congressional_districts:full:1903 # All data combined
congressional_districts:list:state:19 # All districts in Iowa
congressional_districts:geojson:all # Full GeoJSON (large)
state_legislative:polygon:19:upper:01
state_legislative:census:19:upper:01
# Metadata
_meta:releases:current # Current release version
_meta:releases:v1.0.0:manifest # Release manifest
_meta:cache_stats # Hit/miss counters
API Changes¶
New Endpoints¶
GET /api/v1/releases # List available releases
GET /api/v1/releases/{version}/manifest # Get release manifest
GET /api/v1/releases/{version}/{class}/bundle.geojson.gz # Download bundle
POST /api/v1/admin/cache/warm # Warm all caches
POST /api/v1/admin/cache/clear # Clear cache
GET /api/v1/admin/cache/stats # Cache statistics
GET /api/v1/{data_class} # Generic list endpoint
GET /api/v1/{data_class}/{id} # Generic detail endpoint
GET /api/v1/{data_class}/{id}/full # All data combined
Response Headers¶
Performance Targets¶
| Endpoint | Current | With Redis | With Bundles |
|---|---|---|---|
| List all districts | ~200ms | ~10ms | N/A (client-side) |
| Single district | ~50ms | ~5ms | N/A (client-side) |
| Wikipedia data | 5-7sec (cold) | ~5ms (warm) | ~1ms (pre-built) |
| GeoJSON bundle | ~500ms | ~50ms | Instant (static file) |
Refresh Strategy¶
| Data Type | Refresh Frequency | Method |
|---|---|---|
| Polygons | Yearly (redistricting) | Manual release |
| Census | Yearly (ACS release) | Manual release |
| Wikipedia | Daily | Background job |
| Elections | As needed | Manual release |
Implementation Priority¶
- Week 1: Redis caching layer for all endpoints
- Week 2: Pre-built bundle generation + nginx serving
- Week 3: Background cache warming jobs
- Week 4: Multi-class data abstraction
Dependencies to Add¶
Files to Create¶
api/
├── cache/
│ ├── __init__.py
│ ├── redis_client.py # Redis connection
│ ├── decorators.py # @cache_response decorator
│ └── keys.py # Key schema constants
├── jobs/
│ ├── __init__.py
│ ├── cache_warmer.py # Background refresh
│ └── scheduler.py # Job scheduling
scripts/
├── build_release.py # Generate release bundles
└── warm_cache.py # CLI cache warmer