Skip to content

cbintel Next Steps

Planning document for upcoming development work on the cbintel intelligence toolkit.

1. OpenWRT Device Management Enhancement

Current State

  • 16 OpenWRT workers (17.0.0.10-25) managed via LuCI RPC
  • Basic worker status via /api/v1/workers/
  • WireGuard tunnels (wg01-wg15) for direct routing

Proposed Enhancements

1.1 Device Registry Model

Create a comprehensive Pydantic model for device tracking:

class DeviceRegistry(BaseModel):
    """Complete device registry entry"""
    id: int                          # Unique device ID (1-16)
    hostname: str                    # e.g., "lazarus-worker-01"
    lan_ip: str                      # e.g., "17.0.0.10"

    # WireGuard tunnel info
    wg_interface: Optional[str]      # e.g., "wg01"
    wg_tunnel_ip: Optional[str]      # e.g., "10.200.1.2"

    # VPN connection info
    vpn_profile: Optional[str]       # e.g., "es-123.protonvpn.udp.ovpn"
    vpn_country: Optional[str]       # e.g., "Spain"
    vpn_region: Optional[str]        # e.g., "Madrid"
    external_ip: Optional[str]       # e.g., "1.23.34.56"

    # Status
    status: DeviceStatus             # online/offline/degraded
    last_seen: datetime
    uptime: int                      # seconds

    # Performance metrics
    ping_latency_ms: Optional[float]
    download_speed_mbps: Optional[float]
    upload_speed_mbps: Optional[float]

1.2 New API Endpoints

Endpoint Method Description
/api/v1/devices/ GET List all devices with full registry info
/api/v1/devices/{id} GET Get single device details
/api/v1/devices/{id} PUT Update device metadata
/api/v1/devices/{id}/ping POST Run ping test, return latency
/api/v1/devices/{id}/speedtest POST Run bandwidth test
/api/v1/devices/{id}/status GET Quick status check
/api/v1/devices/{id}/reboot POST Reboot device
/api/v1/devices/{id}/execute POST Execute command on device

1.3 Utility Functions

# cbintel/cluster/services/device_service.py

class DeviceService:
    async def ping_device(self, device_id: int) -> PingResult:
        """Ping device and return latency metrics"""

    async def get_external_ip(self, device_id: int) -> ExternalIPInfo:
        """Get current external IP with geolocation"""

    async def run_speedtest(self, device_id: int) -> SpeedTestResult:
        """Run bandwidth test on device"""

    async def execute_command(self, device_id: int, cmd: str) -> CommandResult:
        """Execute arbitrary command on device"""

    async def get_system_info(self, device_id: int) -> SystemInfo:
        """Get CPU, memory, disk usage"""

    async def update_firmware(self, device_id: int) -> UpdateResult:
        """Trigger firmware update"""

    async def sync_time(self, device_id: int) -> bool:
        """Sync device time via NTP"""

1.4 Device Registry Storage

Options: - JSON file at /var/lib/vpn-banks/device-registry.json - SQLite database for queryability - Integration with existing bank state


2. cbintel Sub-Service Integration

Current Services

Service Status Description
cbintel.crawl Active AI-powered web crawling with iterative batches
cbintel.cluster Active VPN cluster management API (port 32203)

Planned Sub-Services

2.1 Lazarus - Historical Web Archive

Source: extern/lazarus

Purpose: Retrieve historical web content from Common Crawl and Internet Archive.

Components to integrate: - cdx_toolkit integration for CDX API queries - URL discovery via gau binary - Content retrieval and caching - Temporal analysis (how content changed over time)

Proposed structure:

cbintel/lazarus/
├── __init__.py
├── cdx.py           # CDX API client
├── archive.py       # Internet Archive integration
├── discovery.py     # URL discovery (gau wrapper)
└── temporal.py      # Time-series content analysis

Source: extern/vectl

Purpose: High-performance vector storage, clustering, and semantic search.

Components to integrate: - Embedding generation (via Ollama nomic-embed-text) - Vector storage and indexing - Similarity search - Clustering algorithms

Proposed structure:

cbintel/vectl/
├── __init__.py
├── embeddings.py    # Generate embeddings
├── storage.py       # Vector storage backend
├── search.py        # Similarity search
├── cluster.py       # Clustering operations
└── index.py         # Index management

2.3 Topics - Topic Explorer

Purpose: Discover and track topics across crawled content.

Features: - Topic extraction from documents - Topic clustering and hierarchy - Trend detection over time - Related topic suggestions

Proposed structure:

cbintel/topics/
├── __init__.py
├── extract.py       # Topic extraction
├── cluster.py       # Topic clustering
├── trends.py        # Trend analysis
└── graph.py         # Topic relationship graph

2.4 Screenshots - Visual Capture

Source: extern/guzl, extern/ferret

Purpose: Browser automation for screenshots and visual analysis.

Components to integrate: - Playwright-based screenshot capture - PDF generation - DOM extraction - Visual diff detection

Proposed structure:

cbintel/screenshots/
├── __init__.py
├── capture.py       # Screenshot capture
├── pdf.py           # PDF generation
├── dom.py           # DOM extraction
└── diff.py          # Visual diff detection

Integration Architecture

┌─────────────────────────────────────────────────────────┐
│                    cbintel CLI                          │
│   cbintel-crawl | cbintel-cluster | cbintel-lazarus    │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│                   Core Libraries                         │
│  cbintel.ai | cbintel.net | cbintel.io                  │
└─────────────────────────────────────────────────────────┘
┌──────────┬──────────┬──────────┬──────────┬────────────┐
│  crawl   │  cluster │  lazarus │  vectl   │ screenshots│
│          │          │          │          │            │
│ Pipeline │ VPN Mgmt │ Archives │ Vectors  │ Visual     │
│ Batches  │ Workers  │ CDX      │ Search   │ Capture    │
│ Evaluate │ Banks    │ Temporal │ Cluster  │ DOM        │
└──────────┴──────────┴──────────┴──────────┴────────────┘
┌─────────────────────────────────────────────────────────┐
│              External Dependencies                       │
│  Ollama | Anthropic | Playwright | cdx_toolkit | gau   │
└─────────────────────────────────────────────────────────┘

Unified API Gateway

Consider a unified FastAPI gateway that aggregates all sub-services:

/api/v1/crawl/*      → cbintel.crawl
/api/v1/cluster/*    → cbintel.cluster (port 32203)
/api/v1/lazarus/*    → cbintel.lazarus
/api/v1/vectl/*      → cbintel.vectl
/api/v1/screenshots/* → cbintel.screenshots

3. Documentation Consolidation

Current Documentation

Document Location Status
Project Overview docs/index.md Active
VPN Cluster API docs/vpn-cluster-api.md New
WireGuard Tunnel docs/wireguard-tunnel.md Active
OpenVPN Tunnel docs/tunnel-openvpn.md Legacy
CRAWWL Analysis docs/CRAWWL-ANALYSIS.md Reference

Documentation Improvements

3.1 Architecture Documentation

Create docs/architecture.md: - System overview diagram - Component relationships - Data flow between services - Deployment topology

3.2 API Reference

Create docs/api-reference.md: - Consolidated endpoint documentation - Authentication (if added) - Rate limiting - Error codes

3.3 Developer Guide

Create docs/developer-guide.md: - Local development setup - Adding new sub-services - Testing guidelines - Code style conventions

3.4 Operations Guide

Create docs/operations.md: - Systemd service management - Log locations and rotation - Monitoring and alerting - Backup and recovery

3.5 Configuration Reference

Create docs/configuration.md: - All environment variables - .env file structure - Per-service configuration - Secrets management

Documentation Structure

docs/
├── index.md                 # Project overview
├── NEXT-STEPS.md           # This file
├── architecture.md          # System architecture
├── api-reference.md         # API documentation
├── developer-guide.md       # Development guide
├── operations.md            # Operations guide
├── configuration.md         # Configuration reference
├── services/
│   ├── cluster.md          # VPN Cluster API
│   ├── crawl.md            # Crawl pipeline
│   ├── lazarus.md          # Historical archives
│   ├── vectl.md            # Vector search
│   └── screenshots.md      # Visual capture
└── infrastructure/
    ├── wireguard-tunnel.md  # WireGuard setup
    ├── openvpn-tunnel.md    # OpenVPN setup
    └── openwrt-devices.md   # Device management

Priority Order

Phase 1: Device Management (High Priority) ✅ COMPLETED

  • Create DeviceRegistry model
  • Implement device service with utility functions
  • Add device management endpoints
  • Update cluster API documentation

Phase 2: Lazarus Integration (High Priority) ✅ COMPLETED

  • Copy lazarus components to cbintel
  • Adapt for cbintel patterns
  • Create CLI entry point
  • Document usage

Phase 3: Vectl Integration (Medium Priority) ✅ COMPLETED

  • Evaluate vectl C++ vs pure Python approach
  • Integrate embedding generation
  • Implement search functionality
  • Add clustering capabilities

Phase 4: Screenshots (Medium Priority) ✅ COMPLETED

  • Integrate guzl/ferret components
  • Add to cluster API or separate service
  • Implement visual diff

Phase 5: Documentation (Ongoing) ✅ COMPLETED

  • Architecture documentation (docs/architecture.md)
  • API reference consolidation (docs/api-reference.md)
  • Updated main README with all modules

Future Work

Potential next phases for cbintel development:

Phase 6: Unified API Gateway

  • Single FastAPI gateway aggregating all sub-services
  • Authentication and rate limiting
  • OpenAPI documentation consolidation

Phase 7: Topics Integration

  • Topic extraction from crawled content
  • Topic clustering and hierarchy
  • Trend detection over time
  • Related topic suggestions

Phase 8: Visual Diff Detection

  • Compare screenshots over time
  • Detect visual changes in web pages
  • Integration with lazarus for historical comparisons

Phase 9: Production Hardening

  • Add authentication to cluster API
  • Systemd service templates for all components
  • Monitoring and alerting integration
  • Log aggregation

Notes

  • All sub-services should follow the same patterns as cbintel.cluster
  • Use Pydantic models for all data structures
  • Async-first design for network operations
  • Centralized configuration via .env
  • Systemd services for production deployment