CB Radio Roadmap¶

Current State Summary¶

Metric	Value
Total Documents	5,085
Pending	1,295 (25%)
Completed	1,254 (25%)
Needs Review	2,090 (41%)
Failed	446 (9%)
Total Rates Extracted	~13,000+
Unique Stations	1,908

Phase 1: Finish Document Processing¶

Goal: Process remaining 1,295 pending documents to maximize rate coverage.

1.1 Run Batch Processing¶

# Generate fresh batches from pending documents
python3 scripts/generate_batches.py --status pending --batch-size 100

# Process batches (can run multiple in parallel terminals)
python3 -u src/workers/document_processor.py --batch-file batches/batch_0000.json
python3 -u src/workers/document_processor.py --batch-file batches/batch_0001.json
# ... continue for all batches

Expected time: ~6-8 hours for 1,295 documents at ~200 docs/hour

1.2 Triage Failed Documents (446)¶

Current failure breakdown: - JSON parsing failures from Claude (~60%) - OCR failures - poor quality scans (~25%) - Timeout errors (~15%)

Actions: 1. Export failed document list with error messages 2. Retry with increased timeout (180s → 300s) for timeout errors 3. Convert PDF→PNG for OCR failures (95% success rate for images vs 65% for PDFs) 4. Improve prompt for JSON parsing failures (add structured output format)

1.3 Review Low-Confidence Extractions (2,090)¶

Documents marked needs_review have: - Confidence score < 0.5, OR - No rates extracted despite successful OCR

Actions: 1. Query documents by confidence bands: - 0.4-0.5: Likely salvageable with minor fixes - 0.2-0.4: May need re-extraction with better prompt - < 0.2: Likely not rate cards (agreements, emails, misc) 2. Reclassify non-rate-card documents as document_type = 'misc' and skip 3. Re-run extraction on legitimate rate cards with improved prompts

1.4 Deduplication¶

Current data quality issues: - ~37% potential duplicate rates - Same document processed multiple times from different sources

Actions: 1. Add unique constraint: (station_id, ad_type, duration_seconds, effective_date, slot_name) 2. Implement UPSERT logic - newer extraction wins 3. Run deduplication script on existing rates

1.5 Deliverables¶

All 1,295 pending docs processed
Failed count reduced to < 100
Needs review count reduced to < 500
Duplicate rates removed
Total unique rates: Target 15,000+

Phase 2: Rate Cards API¶

Goal: Deliver rate data in a standardized format for API consumers.

2.1 Target Format¶

Based on docs/rate-cards/radio-rate-card-sample.xlsx, the consumer-facing rate card format:

{
  "rate_card_id": "uuid",
  "station": {
    "callsign": "WXXX-AM",
    "market": "Detroit, MI",
    "format": "News/Talk",
    "nielsen_dma": "Detroit"
  },
  "year": 2026,
  "effective_date": "2026-01-01",
  "expiration_date": "2026-12-31",
  "is_in_network": true,
  "rates": [
    {
      "ad_type": "candidate",
      "preemption": "non_preemptible",
      "daypart": "Morning Drive",
      "days": "M-F",
      "time_start": "06:00",
      "time_end": "10:00",
      "durations": {
        "30": { "gross": 150.00, "net": 127.50 },
        "60": { "gross": 275.00, "net": 233.75 }
      }
    }
  ]
}

2.2 Out-of-Network Rate Cards¶

Located in docs/rate-cards/amfm-rates/ (20 Excel files from external sources): - Different market formats (Akron OH, Detroit MI, Phoenix AZ, etc.) - Naming pattern: {Year}{Type}Rates_{Market}_{Timestamp}.xlsx

Ingestion approach: 1. Create scripts/ingest_external_rates.py 2. Parse Excel files using openpyxl 3. Map columns to standardized schema 4. Tag stations as is_in_network = false 5. Store with source tracking (source = 'amfm')

2.3 New Endpoints¶

Method	Endpoint	Purpose
GET	`/api/v1/rate-cards/export`	Export rate cards in consumer format
GET	`/api/v1/rate-cards/export/{callsign}`	Export single station's rate card
GET	`/api/v1/rate-cards/by-market/{dma}`	Get all rate cards for a DMA market
GET	`/api/v1/rate-cards/active`	Get only current/active rate cards
POST	`/api/v1/rate-cards/import`	Bulk import rate cards (for external data)

2.4 Versioning Logic¶

Rate cards are versioned by date: - effective_date determines when rates become active - Most recent effective_date <= today is the "active" version - Future rate cards (effective_date > today) are "scheduled" - Past rate cards are archived but queryable

-- Get active rate card for a station
SELECT * FROM rate_card
WHERE station_id = ?
  AND effective_date <= CURRENT_DATE
ORDER BY effective_date DESC
LIMIT 1

2.5 Schema Changes¶

Add to rate_card table:

ALTER TABLE rate_card ADD COLUMN is_in_network BOOLEAN DEFAULT TRUE;
ALTER TABLE rate_card ADD COLUMN source VARCHAR DEFAULT 'internal';
ALTER TABLE rate_card ADD COLUMN source_file VARCHAR;

2.6 Deliverables¶

External rate card ingestion pipeline
Rate card export endpoint in consumer format
Versioning logic (active vs scheduled vs archived)
In-network vs out-of-network tagging
Documentation for API consumers

Phase 3: Proposals¶

Goal: CRUD for proposals and ingestion pipeline for historical proposals.

3.1 Data Model¶

class Proposal:
    id: str                          # UUID
    proposal_number: str             # Human-readable ID (e.g., "P-2026-0001")
    name: str                        # Proposal title

    # Relationships
    client_id: str
    consultant_id: Optional[str]
    buyer_agency_id: Optional[str]

    # Campaign Parameters
    ad_type: AdType                  # candidate, issue
    election_type: Optional[str]     # primary, general
    start_date: date
    end_date: date
    election_date: Optional[date]

    # Spot Configuration
    spot_duration: int               # 15, 30, 60, 90, 120
    preemption: PreemptionStatus
    time_slot_preference: Optional[str]

    # Geographic Targeting
    target_states: list[str]
    target_dmas: Optional[list[str]]
    rural_only: bool = False

    # Financial
    commission_rate: Decimal = 0.15

    # Status
    status: ProposalStatus           # draft, pending, sent, accepted, rejected, converted
    expires_at: Optional[datetime]

    # Timestamps
    created_at: datetime
    updated_at: datetime
    accepted_at: Optional[datetime]
    converted_to_buy_id: Optional[str]

class ProposalLineItem:
    id: str
    proposal_id: str
    station_id: str
    station_callsign: str            # Denormalized

    # Quantity
    spots_per_week: int
    weeks: int

    # Pricing (snapshot at proposal time)
    rate_id: Optional[str]           # Source rate reference
    gross_rate_per_spot: Decimal
    net_rate_per_spot: Decimal
    total_gross: Decimal             # Calculated
    total_net: Decimal               # Calculated

    notes: Optional[str]

class ProposalStatus(Enum):
    DRAFT = "draft"
    PENDING_REVIEW = "pending_review"
    SENT_TO_CLIENT = "sent"
    ACCEPTED = "accepted"
    REJECTED = "rejected"
    EXPIRED = "expired"
    CONVERTED = "converted"          # Became a RadioBuy

3.2 Database Tables¶

CREATE TABLE proposal (
    id VARCHAR PRIMARY KEY,
    proposal_number VARCHAR UNIQUE NOT NULL,
    name VARCHAR NOT NULL,

    client_id VARCHAR,
    consultant_id VARCHAR,
    buyer_agency_id VARCHAR,

    ad_type VARCHAR NOT NULL,
    election_type VARCHAR,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    election_date DATE,

    spot_duration INTEGER NOT NULL,
    preemption VARCHAR NOT NULL,
    time_slot_preference VARCHAR,

    target_states VARCHAR,           -- JSON array
    target_dmas VARCHAR,             -- JSON array
    rural_only BOOLEAN DEFAULT FALSE,

    commission_rate DECIMAL(5,4) DEFAULT 0.15,

    status VARCHAR DEFAULT 'draft',
    expires_at TIMESTAMP,
    accepted_at TIMESTAMP,
    rejected_at TIMESTAMP,
    rejection_reason TEXT,
    converted_to_buy_id VARCHAR,

    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE proposal_line_item (
    id VARCHAR PRIMARY KEY,
    proposal_id VARCHAR NOT NULL,
    station_id VARCHAR,
    station_callsign VARCHAR,

    spots_per_week INTEGER NOT NULL,
    weeks INTEGER NOT NULL,

    rate_id VARCHAR,
    gross_rate_per_spot DECIMAL(10,2) NOT NULL,
    net_rate_per_spot DECIMAL(10,2) NOT NULL,
    total_gross DECIMAL(12,2) NOT NULL,
    total_net DECIMAL(12,2) NOT NULL,

    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_proposal_status ON proposal(status);
CREATE INDEX idx_proposal_client ON proposal(client_id);
CREATE INDEX idx_proposal_dates ON proposal(start_date, end_date);
CREATE INDEX idx_line_item_proposal ON proposal_line_item(proposal_id);

3.3 API Endpoints¶

Method	Endpoint	Purpose
GET	`/api/v1/proposals`	List proposals with filters
GET	`/api/v1/proposals/{id}`	Get proposal with line items
POST	`/api/v1/proposals`	Create proposal
PATCH	`/api/v1/proposals/{id}`	Update proposal
DELETE	`/api/v1/proposals/{id}`	Delete proposal
POST	`/api/v1/proposals/{id}/line-items`	Add stations to proposal
PATCH	`/api/v1/proposals/{id}/line-items`	Update line items
DELETE	`/api/v1/proposals/{id}/line-items`	Remove stations
POST	`/api/v1/proposals/{id}/send`	Mark as sent to client
POST	`/api/v1/proposals/{id}/accept`	Mark as accepted
POST	`/api/v1/proposals/{id}/reject`	Mark as rejected
POST	`/api/v1/proposals/{id}/convert`	Convert to RadioBuy
GET	`/api/v1/proposals/{id}/summary`	Financial summary
GET	`/api/v1/proposals/{id}/export`	Export as PDF/Excel

3.4 Ingestion Pipeline¶

For historical proposals in docs/proposals/:

# scripts/ingest_proposals.py
class ProposalIngestionPipeline:
    def ingest_file(self, file_path: str) -> Proposal:
        """
        1. Detect format (PDF, Excel, CSV)
        2. Extract data using appropriate parser
        3. Map to Proposal schema
        4. Validate against existing clients/stations
        5. Create proposal with line items
        6. Return created proposal
        """

    def ingest_directory(self, dir_path: str) -> list[Proposal]:
        """Process all files in directory"""

3.5 Deliverables¶

Proposal and ProposalLineItem models
Database schema migration
CRUD API endpoints
Proposal status workflow (draft → sent → accepted/rejected → converted)
Historical proposal ingestion pipeline
Proposal export (PDF/Excel)

Phase 4: Proposal Generation (Stub)¶

Goal: Prepare the system for automated proposal generation.

4.1 Generation Flow (Future)¶

Client Request
    ↓
[Parameters]
  - Target states/markets
  - Ad type (candidate/issue)
  - Duration preference
  - Budget range
  - Date range
  - Preemption preference
    ↓
[Rate Lookup]
  - Query active rates matching parameters
  - Filter by station network status
  - Sort by cost efficiency or reach
    ↓
[Station Selection]
  - Auto-select optimal station mix
  - Balance rural/non-rural coverage
  - Respect budget constraints
    ↓
[Proposal Assembly]
  - Create proposal with selected stations
  - Calculate totals and commission
  - Generate summary
    ↓
[Output]
  - Draft proposal ready for review
  - Exportable to client format

4.2 Stub Endpoint¶

@router.post("/api/v1/proposals/generate")
async def generate_proposal(
    request: ProposalGenerationRequest
) -> Proposal:
    """
    Generate a proposal based on campaign parameters.

    This is a STUB - returns a placeholder proposal.
    Full implementation pending business requirements.
    """
    raise HTTPException(
        status_code=501,
        detail="Proposal generation not yet implemented. Use manual proposal creation."
    )

4.3 Future Requirements (TBD)¶

When ready to implement, we'll need: - [ ] Station scoring/ranking algorithm - [ ] Budget optimization logic - [ ] Market coverage requirements - [ ] Reach/frequency calculations (if Nielsen data available) - [ ] Client preference learning (historical data)

Implementation Order¶

Session 1: Document Processing¶

Generate batches for pending documents
Run batch processing (background)
Analyze and triage failed documents
Start deduplication work

Session 2: Rate Cards¶

Create external rate card ingestion script
Implement rate card export endpoint
Add versioning logic
Tag in-network vs out-of-network

Session 3: Proposals Part 1¶

Create proposal data models
Run database migrations
Implement CRUD endpoints
Add status workflow endpoints

Session 4: Proposals Part 2¶

Implement proposal line item management
Add financial calculations
Create proposal export (PDF/Excel)
Build historical proposal ingestion

Session 5: Generation Stub + Polish¶

Add stub generation endpoint
Integration testing
Documentation
Performance optimization

Dependencies & Blockers¶

Item	Depends On	Notes
Rate card export	Document processing	Need complete rate data
External rates	Sample files in amfm-rates/	20 files ready
Proposal ingestion	Sample proposals	Need files in docs/proposals/
Proposal generation	Rate cards + Proposals	Both must be complete

Success Metrics¶

Metric	Target
Document processing completion	> 95%
Unique rates extracted	> 15,000
Rate card API response time	< 200ms
Proposal CRUD operations	Full coverage
External rate cards ingested	20 files

Last updated: 2026-01-06