CB Radio Roadmap¶
Current State Summary¶
| Metric | Value |
|---|---|
| Total Documents | 5,085 |
| Pending | 1,295 (25%) |
| Completed | 1,254 (25%) |
| Needs Review | 2,090 (41%) |
| Failed | 446 (9%) |
| Total Rates Extracted | ~13,000+ |
| Unique Stations | 1,908 |
Phase 1: Finish Document Processing¶
Goal: Process remaining 1,295 pending documents to maximize rate coverage.
1.1 Run Batch Processing¶
# Generate fresh batches from pending documents
python3 scripts/generate_batches.py --status pending --batch-size 100
# Process batches (can run multiple in parallel terminals)
python3 -u src/workers/document_processor.py --batch-file batches/batch_0000.json
python3 -u src/workers/document_processor.py --batch-file batches/batch_0001.json
# ... continue for all batches
Expected time: ~6-8 hours for 1,295 documents at ~200 docs/hour
1.2 Triage Failed Documents (446)¶
Current failure breakdown: - JSON parsing failures from Claude (~60%) - OCR failures - poor quality scans (~25%) - Timeout errors (~15%)
Actions: 1. Export failed document list with error messages 2. Retry with increased timeout (180s → 300s) for timeout errors 3. Convert PDF→PNG for OCR failures (95% success rate for images vs 65% for PDFs) 4. Improve prompt for JSON parsing failures (add structured output format)
1.3 Review Low-Confidence Extractions (2,090)¶
Documents marked needs_review have:
- Confidence score < 0.5, OR
- No rates extracted despite successful OCR
Actions:
1. Query documents by confidence bands:
- 0.4-0.5: Likely salvageable with minor fixes
- 0.2-0.4: May need re-extraction with better prompt
- < 0.2: Likely not rate cards (agreements, emails, misc)
2. Reclassify non-rate-card documents as document_type = 'misc' and skip
3. Re-run extraction on legitimate rate cards with improved prompts
1.4 Deduplication¶
Current data quality issues: - ~37% potential duplicate rates - Same document processed multiple times from different sources
Actions:
1. Add unique constraint: (station_id, ad_type, duration_seconds, effective_date, slot_name)
2. Implement UPSERT logic - newer extraction wins
3. Run deduplication script on existing rates
1.5 Deliverables¶
- All 1,295 pending docs processed
- Failed count reduced to < 100
- Needs review count reduced to < 500
- Duplicate rates removed
- Total unique rates: Target 15,000+
Phase 2: Rate Cards API¶
Goal: Deliver rate data in a standardized format for API consumers.
2.1 Target Format¶
Based on docs/rate-cards/radio-rate-card-sample.xlsx, the consumer-facing rate card format:
{
"rate_card_id": "uuid",
"station": {
"callsign": "WXXX-AM",
"market": "Detroit, MI",
"format": "News/Talk",
"nielsen_dma": "Detroit"
},
"year": 2026,
"effective_date": "2026-01-01",
"expiration_date": "2026-12-31",
"is_in_network": true,
"rates": [
{
"ad_type": "candidate",
"preemption": "non_preemptible",
"daypart": "Morning Drive",
"days": "M-F",
"time_start": "06:00",
"time_end": "10:00",
"durations": {
"30": { "gross": 150.00, "net": 127.50 },
"60": { "gross": 275.00, "net": 233.75 }
}
}
]
}
2.2 Out-of-Network Rate Cards¶
Located in docs/rate-cards/amfm-rates/ (20 Excel files from external sources):
- Different market formats (Akron OH, Detroit MI, Phoenix AZ, etc.)
- Naming pattern: {Year}{Type}Rates_{Market}_{Timestamp}.xlsx
Ingestion approach:
1. Create scripts/ingest_external_rates.py
2. Parse Excel files using openpyxl
3. Map columns to standardized schema
4. Tag stations as is_in_network = false
5. Store with source tracking (source = 'amfm')
2.3 New Endpoints¶
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /api/v1/rate-cards/export |
Export rate cards in consumer format |
| GET | /api/v1/rate-cards/export/{callsign} |
Export single station's rate card |
| GET | /api/v1/rate-cards/by-market/{dma} |
Get all rate cards for a DMA market |
| GET | /api/v1/rate-cards/active |
Get only current/active rate cards |
| POST | /api/v1/rate-cards/import |
Bulk import rate cards (for external data) |
2.4 Versioning Logic¶
Rate cards are versioned by date:
- effective_date determines when rates become active
- Most recent effective_date <= today is the "active" version
- Future rate cards (effective_date > today) are "scheduled"
- Past rate cards are archived but queryable
-- Get active rate card for a station
SELECT * FROM rate_card
WHERE station_id = ?
AND effective_date <= CURRENT_DATE
ORDER BY effective_date DESC
LIMIT 1
2.5 Schema Changes¶
Add to rate_card table:
ALTER TABLE rate_card ADD COLUMN is_in_network BOOLEAN DEFAULT TRUE;
ALTER TABLE rate_card ADD COLUMN source VARCHAR DEFAULT 'internal';
ALTER TABLE rate_card ADD COLUMN source_file VARCHAR;
2.6 Deliverables¶
- External rate card ingestion pipeline
- Rate card export endpoint in consumer format
- Versioning logic (active vs scheduled vs archived)
- In-network vs out-of-network tagging
- Documentation for API consumers
Phase 3: Proposals¶
Goal: CRUD for proposals and ingestion pipeline for historical proposals.
3.1 Data Model¶
class Proposal:
id: str # UUID
proposal_number: str # Human-readable ID (e.g., "P-2026-0001")
name: str # Proposal title
# Relationships
client_id: str
consultant_id: Optional[str]
buyer_agency_id: Optional[str]
# Campaign Parameters
ad_type: AdType # candidate, issue
election_type: Optional[str] # primary, general
start_date: date
end_date: date
election_date: Optional[date]
# Spot Configuration
spot_duration: int # 15, 30, 60, 90, 120
preemption: PreemptionStatus
time_slot_preference: Optional[str]
# Geographic Targeting
target_states: list[str]
target_dmas: Optional[list[str]]
rural_only: bool = False
# Financial
commission_rate: Decimal = 0.15
# Status
status: ProposalStatus # draft, pending, sent, accepted, rejected, converted
expires_at: Optional[datetime]
# Timestamps
created_at: datetime
updated_at: datetime
accepted_at: Optional[datetime]
converted_to_buy_id: Optional[str]
class ProposalLineItem:
id: str
proposal_id: str
station_id: str
station_callsign: str # Denormalized
# Quantity
spots_per_week: int
weeks: int
# Pricing (snapshot at proposal time)
rate_id: Optional[str] # Source rate reference
gross_rate_per_spot: Decimal
net_rate_per_spot: Decimal
total_gross: Decimal # Calculated
total_net: Decimal # Calculated
notes: Optional[str]
class ProposalStatus(Enum):
DRAFT = "draft"
PENDING_REVIEW = "pending_review"
SENT_TO_CLIENT = "sent"
ACCEPTED = "accepted"
REJECTED = "rejected"
EXPIRED = "expired"
CONVERTED = "converted" # Became a RadioBuy
3.2 Database Tables¶
CREATE TABLE proposal (
id VARCHAR PRIMARY KEY,
proposal_number VARCHAR UNIQUE NOT NULL,
name VARCHAR NOT NULL,
client_id VARCHAR,
consultant_id VARCHAR,
buyer_agency_id VARCHAR,
ad_type VARCHAR NOT NULL,
election_type VARCHAR,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
election_date DATE,
spot_duration INTEGER NOT NULL,
preemption VARCHAR NOT NULL,
time_slot_preference VARCHAR,
target_states VARCHAR, -- JSON array
target_dmas VARCHAR, -- JSON array
rural_only BOOLEAN DEFAULT FALSE,
commission_rate DECIMAL(5,4) DEFAULT 0.15,
status VARCHAR DEFAULT 'draft',
expires_at TIMESTAMP,
accepted_at TIMESTAMP,
rejected_at TIMESTAMP,
rejection_reason TEXT,
converted_to_buy_id VARCHAR,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE proposal_line_item (
id VARCHAR PRIMARY KEY,
proposal_id VARCHAR NOT NULL,
station_id VARCHAR,
station_callsign VARCHAR,
spots_per_week INTEGER NOT NULL,
weeks INTEGER NOT NULL,
rate_id VARCHAR,
gross_rate_per_spot DECIMAL(10,2) NOT NULL,
net_rate_per_spot DECIMAL(10,2) NOT NULL,
total_gross DECIMAL(12,2) NOT NULL,
total_net DECIMAL(12,2) NOT NULL,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_proposal_status ON proposal(status);
CREATE INDEX idx_proposal_client ON proposal(client_id);
CREATE INDEX idx_proposal_dates ON proposal(start_date, end_date);
CREATE INDEX idx_line_item_proposal ON proposal_line_item(proposal_id);
3.3 API Endpoints¶
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /api/v1/proposals |
List proposals with filters |
| GET | /api/v1/proposals/{id} |
Get proposal with line items |
| POST | /api/v1/proposals |
Create proposal |
| PATCH | /api/v1/proposals/{id} |
Update proposal |
| DELETE | /api/v1/proposals/{id} |
Delete proposal |
| POST | /api/v1/proposals/{id}/line-items |
Add stations to proposal |
| PATCH | /api/v1/proposals/{id}/line-items |
Update line items |
| DELETE | /api/v1/proposals/{id}/line-items |
Remove stations |
| POST | /api/v1/proposals/{id}/send |
Mark as sent to client |
| POST | /api/v1/proposals/{id}/accept |
Mark as accepted |
| POST | /api/v1/proposals/{id}/reject |
Mark as rejected |
| POST | /api/v1/proposals/{id}/convert |
Convert to RadioBuy |
| GET | /api/v1/proposals/{id}/summary |
Financial summary |
| GET | /api/v1/proposals/{id}/export |
Export as PDF/Excel |
3.4 Ingestion Pipeline¶
For historical proposals in docs/proposals/:
# scripts/ingest_proposals.py
class ProposalIngestionPipeline:
def ingest_file(self, file_path: str) -> Proposal:
"""
1. Detect format (PDF, Excel, CSV)
2. Extract data using appropriate parser
3. Map to Proposal schema
4. Validate against existing clients/stations
5. Create proposal with line items
6. Return created proposal
"""
def ingest_directory(self, dir_path: str) -> list[Proposal]:
"""Process all files in directory"""
3.5 Deliverables¶
- Proposal and ProposalLineItem models
- Database schema migration
- CRUD API endpoints
- Proposal status workflow (draft → sent → accepted/rejected → converted)
- Historical proposal ingestion pipeline
- Proposal export (PDF/Excel)
Phase 4: Proposal Generation (Stub)¶
Goal: Prepare the system for automated proposal generation.
4.1 Generation Flow (Future)¶
Client Request
↓
[Parameters]
- Target states/markets
- Ad type (candidate/issue)
- Duration preference
- Budget range
- Date range
- Preemption preference
↓
[Rate Lookup]
- Query active rates matching parameters
- Filter by station network status
- Sort by cost efficiency or reach
↓
[Station Selection]
- Auto-select optimal station mix
- Balance rural/non-rural coverage
- Respect budget constraints
↓
[Proposal Assembly]
- Create proposal with selected stations
- Calculate totals and commission
- Generate summary
↓
[Output]
- Draft proposal ready for review
- Exportable to client format
4.2 Stub Endpoint¶
@router.post("/api/v1/proposals/generate")
async def generate_proposal(
request: ProposalGenerationRequest
) -> Proposal:
"""
Generate a proposal based on campaign parameters.
This is a STUB - returns a placeholder proposal.
Full implementation pending business requirements.
"""
raise HTTPException(
status_code=501,
detail="Proposal generation not yet implemented. Use manual proposal creation."
)
4.3 Future Requirements (TBD)¶
When ready to implement, we'll need: - [ ] Station scoring/ranking algorithm - [ ] Budget optimization logic - [ ] Market coverage requirements - [ ] Reach/frequency calculations (if Nielsen data available) - [ ] Client preference learning (historical data)
Implementation Order¶
Session 1: Document Processing¶
- Generate batches for pending documents
- Run batch processing (background)
- Analyze and triage failed documents
- Start deduplication work
Session 2: Rate Cards¶
- Create external rate card ingestion script
- Implement rate card export endpoint
- Add versioning logic
- Tag in-network vs out-of-network
Session 3: Proposals Part 1¶
- Create proposal data models
- Run database migrations
- Implement CRUD endpoints
- Add status workflow endpoints
Session 4: Proposals Part 2¶
- Implement proposal line item management
- Add financial calculations
- Create proposal export (PDF/Excel)
- Build historical proposal ingestion
Session 5: Generation Stub + Polish¶
- Add stub generation endpoint
- Integration testing
- Documentation
- Performance optimization
Dependencies & Blockers¶
| Item | Depends On | Notes |
|---|---|---|
| Rate card export | Document processing | Need complete rate data |
| External rates | Sample files in amfm-rates/ | 20 files ready |
| Proposal ingestion | Sample proposals | Need files in docs/proposals/ |
| Proposal generation | Rate cards + Proposals | Both must be complete |
Success Metrics¶
| Metric | Target |
|---|---|
| Document processing completion | > 95% |
| Unique rates extracted | > 15,000 |
| Rate card API response time | < 200ms |
| Proposal CRUD operations | Full coverage |
| External rate cards ingested | 20 files |
Last updated: 2026-01-06