Skip to content

CB Radio Roadmap

Current State Summary

Metric Value
Total Documents 5,085
Pending 1,295 (25%)
Completed 1,254 (25%)
Needs Review 2,090 (41%)
Failed 446 (9%)
Total Rates Extracted ~13,000+
Unique Stations 1,908

Phase 1: Finish Document Processing

Goal: Process remaining 1,295 pending documents to maximize rate coverage.

1.1 Run Batch Processing

# Generate fresh batches from pending documents
python3 scripts/generate_batches.py --status pending --batch-size 100

# Process batches (can run multiple in parallel terminals)
python3 -u src/workers/document_processor.py --batch-file batches/batch_0000.json
python3 -u src/workers/document_processor.py --batch-file batches/batch_0001.json
# ... continue for all batches

Expected time: ~6-8 hours for 1,295 documents at ~200 docs/hour

1.2 Triage Failed Documents (446)

Current failure breakdown: - JSON parsing failures from Claude (~60%) - OCR failures - poor quality scans (~25%) - Timeout errors (~15%)

Actions: 1. Export failed document list with error messages 2. Retry with increased timeout (180s → 300s) for timeout errors 3. Convert PDF→PNG for OCR failures (95% success rate for images vs 65% for PDFs) 4. Improve prompt for JSON parsing failures (add structured output format)

1.3 Review Low-Confidence Extractions (2,090)

Documents marked needs_review have: - Confidence score < 0.5, OR - No rates extracted despite successful OCR

Actions: 1. Query documents by confidence bands: - 0.4-0.5: Likely salvageable with minor fixes - 0.2-0.4: May need re-extraction with better prompt - < 0.2: Likely not rate cards (agreements, emails, misc) 2. Reclassify non-rate-card documents as document_type = 'misc' and skip 3. Re-run extraction on legitimate rate cards with improved prompts

1.4 Deduplication

Current data quality issues: - ~37% potential duplicate rates - Same document processed multiple times from different sources

Actions: 1. Add unique constraint: (station_id, ad_type, duration_seconds, effective_date, slot_name) 2. Implement UPSERT logic - newer extraction wins 3. Run deduplication script on existing rates

1.5 Deliverables

  • All 1,295 pending docs processed
  • Failed count reduced to < 100
  • Needs review count reduced to < 500
  • Duplicate rates removed
  • Total unique rates: Target 15,000+

Phase 2: Rate Cards API

Goal: Deliver rate data in a standardized format for API consumers.

2.1 Target Format

Based on docs/rate-cards/radio-rate-card-sample.xlsx, the consumer-facing rate card format:

{
  "rate_card_id": "uuid",
  "station": {
    "callsign": "WXXX-AM",
    "market": "Detroit, MI",
    "format": "News/Talk",
    "nielsen_dma": "Detroit"
  },
  "year": 2026,
  "effective_date": "2026-01-01",
  "expiration_date": "2026-12-31",
  "is_in_network": true,
  "rates": [
    {
      "ad_type": "candidate",
      "preemption": "non_preemptible",
      "daypart": "Morning Drive",
      "days": "M-F",
      "time_start": "06:00",
      "time_end": "10:00",
      "durations": {
        "30": { "gross": 150.00, "net": 127.50 },
        "60": { "gross": 275.00, "net": 233.75 }
      }
    }
  ]
}

2.2 Out-of-Network Rate Cards

Located in docs/rate-cards/amfm-rates/ (20 Excel files from external sources): - Different market formats (Akron OH, Detroit MI, Phoenix AZ, etc.) - Naming pattern: {Year}{Type}Rates_{Market}_{Timestamp}.xlsx

Ingestion approach: 1. Create scripts/ingest_external_rates.py 2. Parse Excel files using openpyxl 3. Map columns to standardized schema 4. Tag stations as is_in_network = false 5. Store with source tracking (source = 'amfm')

2.3 New Endpoints

Method Endpoint Purpose
GET /api/v1/rate-cards/export Export rate cards in consumer format
GET /api/v1/rate-cards/export/{callsign} Export single station's rate card
GET /api/v1/rate-cards/by-market/{dma} Get all rate cards for a DMA market
GET /api/v1/rate-cards/active Get only current/active rate cards
POST /api/v1/rate-cards/import Bulk import rate cards (for external data)

2.4 Versioning Logic

Rate cards are versioned by date: - effective_date determines when rates become active - Most recent effective_date <= today is the "active" version - Future rate cards (effective_date > today) are "scheduled" - Past rate cards are archived but queryable

-- Get active rate card for a station
SELECT * FROM rate_card
WHERE station_id = ?
  AND effective_date <= CURRENT_DATE
ORDER BY effective_date DESC
LIMIT 1

2.5 Schema Changes

Add to rate_card table:

ALTER TABLE rate_card ADD COLUMN is_in_network BOOLEAN DEFAULT TRUE;
ALTER TABLE rate_card ADD COLUMN source VARCHAR DEFAULT 'internal';
ALTER TABLE rate_card ADD COLUMN source_file VARCHAR;

2.6 Deliverables

  • External rate card ingestion pipeline
  • Rate card export endpoint in consumer format
  • Versioning logic (active vs scheduled vs archived)
  • In-network vs out-of-network tagging
  • Documentation for API consumers

Phase 3: Proposals

Goal: CRUD for proposals and ingestion pipeline for historical proposals.

3.1 Data Model

class Proposal:
    id: str                          # UUID
    proposal_number: str             # Human-readable ID (e.g., "P-2026-0001")
    name: str                        # Proposal title

    # Relationships
    client_id: str
    consultant_id: Optional[str]
    buyer_agency_id: Optional[str]

    # Campaign Parameters
    ad_type: AdType                  # candidate, issue
    election_type: Optional[str]     # primary, general
    start_date: date
    end_date: date
    election_date: Optional[date]

    # Spot Configuration
    spot_duration: int               # 15, 30, 60, 90, 120
    preemption: PreemptionStatus
    time_slot_preference: Optional[str]

    # Geographic Targeting
    target_states: list[str]
    target_dmas: Optional[list[str]]
    rural_only: bool = False

    # Financial
    commission_rate: Decimal = 0.15

    # Status
    status: ProposalStatus           # draft, pending, sent, accepted, rejected, converted
    expires_at: Optional[datetime]

    # Timestamps
    created_at: datetime
    updated_at: datetime
    accepted_at: Optional[datetime]
    converted_to_buy_id: Optional[str]

class ProposalLineItem:
    id: str
    proposal_id: str
    station_id: str
    station_callsign: str            # Denormalized

    # Quantity
    spots_per_week: int
    weeks: int

    # Pricing (snapshot at proposal time)
    rate_id: Optional[str]           # Source rate reference
    gross_rate_per_spot: Decimal
    net_rate_per_spot: Decimal
    total_gross: Decimal             # Calculated
    total_net: Decimal               # Calculated

    notes: Optional[str]

class ProposalStatus(Enum):
    DRAFT = "draft"
    PENDING_REVIEW = "pending_review"
    SENT_TO_CLIENT = "sent"
    ACCEPTED = "accepted"
    REJECTED = "rejected"
    EXPIRED = "expired"
    CONVERTED = "converted"          # Became a RadioBuy

3.2 Database Tables

CREATE TABLE proposal (
    id VARCHAR PRIMARY KEY,
    proposal_number VARCHAR UNIQUE NOT NULL,
    name VARCHAR NOT NULL,

    client_id VARCHAR,
    consultant_id VARCHAR,
    buyer_agency_id VARCHAR,

    ad_type VARCHAR NOT NULL,
    election_type VARCHAR,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    election_date DATE,

    spot_duration INTEGER NOT NULL,
    preemption VARCHAR NOT NULL,
    time_slot_preference VARCHAR,

    target_states VARCHAR,           -- JSON array
    target_dmas VARCHAR,             -- JSON array
    rural_only BOOLEAN DEFAULT FALSE,

    commission_rate DECIMAL(5,4) DEFAULT 0.15,

    status VARCHAR DEFAULT 'draft',
    expires_at TIMESTAMP,
    accepted_at TIMESTAMP,
    rejected_at TIMESTAMP,
    rejection_reason TEXT,
    converted_to_buy_id VARCHAR,

    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE proposal_line_item (
    id VARCHAR PRIMARY KEY,
    proposal_id VARCHAR NOT NULL,
    station_id VARCHAR,
    station_callsign VARCHAR,

    spots_per_week INTEGER NOT NULL,
    weeks INTEGER NOT NULL,

    rate_id VARCHAR,
    gross_rate_per_spot DECIMAL(10,2) NOT NULL,
    net_rate_per_spot DECIMAL(10,2) NOT NULL,
    total_gross DECIMAL(12,2) NOT NULL,
    total_net DECIMAL(12,2) NOT NULL,

    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_proposal_status ON proposal(status);
CREATE INDEX idx_proposal_client ON proposal(client_id);
CREATE INDEX idx_proposal_dates ON proposal(start_date, end_date);
CREATE INDEX idx_line_item_proposal ON proposal_line_item(proposal_id);

3.3 API Endpoints

Method Endpoint Purpose
GET /api/v1/proposals List proposals with filters
GET /api/v1/proposals/{id} Get proposal with line items
POST /api/v1/proposals Create proposal
PATCH /api/v1/proposals/{id} Update proposal
DELETE /api/v1/proposals/{id} Delete proposal
POST /api/v1/proposals/{id}/line-items Add stations to proposal
PATCH /api/v1/proposals/{id}/line-items Update line items
DELETE /api/v1/proposals/{id}/line-items Remove stations
POST /api/v1/proposals/{id}/send Mark as sent to client
POST /api/v1/proposals/{id}/accept Mark as accepted
POST /api/v1/proposals/{id}/reject Mark as rejected
POST /api/v1/proposals/{id}/convert Convert to RadioBuy
GET /api/v1/proposals/{id}/summary Financial summary
GET /api/v1/proposals/{id}/export Export as PDF/Excel

3.4 Ingestion Pipeline

For historical proposals in docs/proposals/:

# scripts/ingest_proposals.py
class ProposalIngestionPipeline:
    def ingest_file(self, file_path: str) -> Proposal:
        """
        1. Detect format (PDF, Excel, CSV)
        2. Extract data using appropriate parser
        3. Map to Proposal schema
        4. Validate against existing clients/stations
        5. Create proposal with line items
        6. Return created proposal
        """

    def ingest_directory(self, dir_path: str) -> list[Proposal]:
        """Process all files in directory"""

3.5 Deliverables

  • Proposal and ProposalLineItem models
  • Database schema migration
  • CRUD API endpoints
  • Proposal status workflow (draft → sent → accepted/rejected → converted)
  • Historical proposal ingestion pipeline
  • Proposal export (PDF/Excel)

Phase 4: Proposal Generation (Stub)

Goal: Prepare the system for automated proposal generation.

4.1 Generation Flow (Future)

Client Request
[Parameters]
  - Target states/markets
  - Ad type (candidate/issue)
  - Duration preference
  - Budget range
  - Date range
  - Preemption preference
[Rate Lookup]
  - Query active rates matching parameters
  - Filter by station network status
  - Sort by cost efficiency or reach
[Station Selection]
  - Auto-select optimal station mix
  - Balance rural/non-rural coverage
  - Respect budget constraints
[Proposal Assembly]
  - Create proposal with selected stations
  - Calculate totals and commission
  - Generate summary
[Output]
  - Draft proposal ready for review
  - Exportable to client format

4.2 Stub Endpoint

@router.post("/api/v1/proposals/generate")
async def generate_proposal(
    request: ProposalGenerationRequest
) -> Proposal:
    """
    Generate a proposal based on campaign parameters.

    This is a STUB - returns a placeholder proposal.
    Full implementation pending business requirements.
    """
    raise HTTPException(
        status_code=501,
        detail="Proposal generation not yet implemented. Use manual proposal creation."
    )

4.3 Future Requirements (TBD)

When ready to implement, we'll need: - [ ] Station scoring/ranking algorithm - [ ] Budget optimization logic - [ ] Market coverage requirements - [ ] Reach/frequency calculations (if Nielsen data available) - [ ] Client preference learning (historical data)


Implementation Order

Session 1: Document Processing

  1. Generate batches for pending documents
  2. Run batch processing (background)
  3. Analyze and triage failed documents
  4. Start deduplication work

Session 2: Rate Cards

  1. Create external rate card ingestion script
  2. Implement rate card export endpoint
  3. Add versioning logic
  4. Tag in-network vs out-of-network

Session 3: Proposals Part 1

  1. Create proposal data models
  2. Run database migrations
  3. Implement CRUD endpoints
  4. Add status workflow endpoints

Session 4: Proposals Part 2

  1. Implement proposal line item management
  2. Add financial calculations
  3. Create proposal export (PDF/Excel)
  4. Build historical proposal ingestion

Session 5: Generation Stub + Polish

  1. Add stub generation endpoint
  2. Integration testing
  3. Documentation
  4. Performance optimization

Dependencies & Blockers

Item Depends On Notes
Rate card export Document processing Need complete rate data
External rates Sample files in amfm-rates/ 20 files ready
Proposal ingestion Sample proposals Need files in docs/proposals/
Proposal generation Rate cards + Proposals Both must be complete

Success Metrics

Metric Target
Document processing completion > 95%
Unique rates extracted > 15,000
Rate card API response time < 200ms
Proposal CRUD operations Full coverage
External rate cards ingested 20 files

Last updated: 2026-01-06