cbintel Next Steps¶
Planning document for upcoming development work on the cbintel intelligence toolkit.
1. OpenWRT Device Management Enhancement¶
Current State¶
- 16 OpenWRT workers (17.0.0.10-25) managed via LuCI RPC
- Basic worker status via
/api/v1/workers/ - WireGuard tunnels (wg01-wg15) for direct routing
Proposed Enhancements¶
1.1 Device Registry Model¶
Create a comprehensive Pydantic model for device tracking:
class DeviceRegistry(BaseModel):
"""Complete device registry entry"""
id: int # Unique device ID (1-16)
hostname: str # e.g., "lazarus-worker-01"
lan_ip: str # e.g., "17.0.0.10"
# WireGuard tunnel info
wg_interface: Optional[str] # e.g., "wg01"
wg_tunnel_ip: Optional[str] # e.g., "10.200.1.2"
# VPN connection info
vpn_profile: Optional[str] # e.g., "es-123.protonvpn.udp.ovpn"
vpn_country: Optional[str] # e.g., "Spain"
vpn_region: Optional[str] # e.g., "Madrid"
external_ip: Optional[str] # e.g., "1.23.34.56"
# Status
status: DeviceStatus # online/offline/degraded
last_seen: datetime
uptime: int # seconds
# Performance metrics
ping_latency_ms: Optional[float]
download_speed_mbps: Optional[float]
upload_speed_mbps: Optional[float]
1.2 New API Endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/api/v1/devices/ |
GET | List all devices with full registry info |
/api/v1/devices/{id} |
GET | Get single device details |
/api/v1/devices/{id} |
PUT | Update device metadata |
/api/v1/devices/{id}/ping |
POST | Run ping test, return latency |
/api/v1/devices/{id}/speedtest |
POST | Run bandwidth test |
/api/v1/devices/{id}/status |
GET | Quick status check |
/api/v1/devices/{id}/reboot |
POST | Reboot device |
/api/v1/devices/{id}/execute |
POST | Execute command on device |
1.3 Utility Functions¶
# cbintel/cluster/services/device_service.py
class DeviceService:
async def ping_device(self, device_id: int) -> PingResult:
"""Ping device and return latency metrics"""
async def get_external_ip(self, device_id: int) -> ExternalIPInfo:
"""Get current external IP with geolocation"""
async def run_speedtest(self, device_id: int) -> SpeedTestResult:
"""Run bandwidth test on device"""
async def execute_command(self, device_id: int, cmd: str) -> CommandResult:
"""Execute arbitrary command on device"""
async def get_system_info(self, device_id: int) -> SystemInfo:
"""Get CPU, memory, disk usage"""
async def update_firmware(self, device_id: int) -> UpdateResult:
"""Trigger firmware update"""
async def sync_time(self, device_id: int) -> bool:
"""Sync device time via NTP"""
1.4 Device Registry Storage¶
Options:
- JSON file at /var/lib/vpn-banks/device-registry.json
- SQLite database for queryability
- Integration with existing bank state
2. cbintel Sub-Service Integration¶
Current Services¶
| Service | Status | Description |
|---|---|---|
cbintel.crawl |
Active | AI-powered web crawling with iterative batches |
cbintel.cluster |
Active | VPN cluster management API (port 32203) |
Planned Sub-Services¶
2.1 Lazarus - Historical Web Archive¶
Source: extern/lazarus
Purpose: Retrieve historical web content from Common Crawl and Internet Archive.
Components to integrate:
- cdx_toolkit integration for CDX API queries
- URL discovery via gau binary
- Content retrieval and caching
- Temporal analysis (how content changed over time)
Proposed structure:
cbintel/lazarus/
├── __init__.py
├── cdx.py # CDX API client
├── archive.py # Internet Archive integration
├── discovery.py # URL discovery (gau wrapper)
└── temporal.py # Time-series content analysis
2.2 Vectl - Vector Embeddings & Search¶
Source: extern/vectl
Purpose: High-performance vector storage, clustering, and semantic search.
Components to integrate: - Embedding generation (via Ollama nomic-embed-text) - Vector storage and indexing - Similarity search - Clustering algorithms
Proposed structure:
cbintel/vectl/
├── __init__.py
├── embeddings.py # Generate embeddings
├── storage.py # Vector storage backend
├── search.py # Similarity search
├── cluster.py # Clustering operations
└── index.py # Index management
2.3 Topics - Topic Explorer¶
Purpose: Discover and track topics across crawled content.
Features: - Topic extraction from documents - Topic clustering and hierarchy - Trend detection over time - Related topic suggestions
Proposed structure:
cbintel/topics/
├── __init__.py
├── extract.py # Topic extraction
├── cluster.py # Topic clustering
├── trends.py # Trend analysis
└── graph.py # Topic relationship graph
2.4 Screenshots - Visual Capture¶
Source: extern/guzl, extern/ferret
Purpose: Browser automation for screenshots and visual analysis.
Components to integrate: - Playwright-based screenshot capture - PDF generation - DOM extraction - Visual diff detection
Proposed structure:
cbintel/screenshots/
├── __init__.py
├── capture.py # Screenshot capture
├── pdf.py # PDF generation
├── dom.py # DOM extraction
└── diff.py # Visual diff detection
Integration Architecture¶
┌─────────────────────────────────────────────────────────┐
│ cbintel CLI │
│ cbintel-crawl | cbintel-cluster | cbintel-lazarus │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ Core Libraries │
│ cbintel.ai | cbintel.net | cbintel.io │
└─────────────────────────────────────────────────────────┘
│
┌──────────┬──────────┬──────────┬──────────┬────────────┐
│ crawl │ cluster │ lazarus │ vectl │ screenshots│
│ │ │ │ │ │
│ Pipeline │ VPN Mgmt │ Archives │ Vectors │ Visual │
│ Batches │ Workers │ CDX │ Search │ Capture │
│ Evaluate │ Banks │ Temporal │ Cluster │ DOM │
└──────────┴──────────┴──────────┴──────────┴────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ External Dependencies │
│ Ollama | Anthropic | Playwright | cdx_toolkit | gau │
└─────────────────────────────────────────────────────────┘
Unified API Gateway¶
Consider a unified FastAPI gateway that aggregates all sub-services:
/api/v1/crawl/* → cbintel.crawl
/api/v1/cluster/* → cbintel.cluster (port 32203)
/api/v1/lazarus/* → cbintel.lazarus
/api/v1/vectl/* → cbintel.vectl
/api/v1/screenshots/* → cbintel.screenshots
3. Documentation Consolidation¶
Current Documentation¶
| Document | Location | Status |
|---|---|---|
| Project Overview | docs/index.md |
Active |
| VPN Cluster API | docs/vpn-cluster-api.md |
New |
| WireGuard Tunnel | docs/wireguard-tunnel.md |
Active |
| OpenVPN Tunnel | docs/tunnel-openvpn.md |
Legacy |
| CRAWWL Analysis | docs/CRAWWL-ANALYSIS.md |
Reference |
Documentation Improvements¶
3.1 Architecture Documentation¶
Create docs/architecture.md:
- System overview diagram
- Component relationships
- Data flow between services
- Deployment topology
3.2 API Reference¶
Create docs/api-reference.md:
- Consolidated endpoint documentation
- Authentication (if added)
- Rate limiting
- Error codes
3.3 Developer Guide¶
Create docs/developer-guide.md:
- Local development setup
- Adding new sub-services
- Testing guidelines
- Code style conventions
3.4 Operations Guide¶
Create docs/operations.md:
- Systemd service management
- Log locations and rotation
- Monitoring and alerting
- Backup and recovery
3.5 Configuration Reference¶
Create docs/configuration.md:
- All environment variables
- .env file structure
- Per-service configuration
- Secrets management
Documentation Structure¶
docs/
├── index.md # Project overview
├── NEXT-STEPS.md # This file
├── architecture.md # System architecture
├── api-reference.md # API documentation
├── developer-guide.md # Development guide
├── operations.md # Operations guide
├── configuration.md # Configuration reference
│
├── services/
│ ├── cluster.md # VPN Cluster API
│ ├── crawl.md # Crawl pipeline
│ ├── lazarus.md # Historical archives
│ ├── vectl.md # Vector search
│ └── screenshots.md # Visual capture
│
└── infrastructure/
├── wireguard-tunnel.md # WireGuard setup
├── openvpn-tunnel.md # OpenVPN setup
└── openwrt-devices.md # Device management
Priority Order¶
Phase 1: Device Management (High Priority) ✅ COMPLETED¶
- Create DeviceRegistry model
- Implement device service with utility functions
- Add device management endpoints
- Update cluster API documentation
Phase 2: Lazarus Integration (High Priority) ✅ COMPLETED¶
- Copy lazarus components to cbintel
- Adapt for cbintel patterns
- Create CLI entry point
- Document usage
Phase 3: Vectl Integration (Medium Priority) ✅ COMPLETED¶
- Evaluate vectl C++ vs pure Python approach
- Integrate embedding generation
- Implement search functionality
- Add clustering capabilities
Phase 4: Screenshots (Medium Priority) ✅ COMPLETED¶
- Integrate guzl/ferret components
- Add to cluster API or separate service
- Implement visual diff
Phase 5: Documentation (Ongoing) ✅ COMPLETED¶
- Architecture documentation (
docs/architecture.md) - API reference consolidation (
docs/api-reference.md) - Updated main README with all modules
Future Work¶
Potential next phases for cbintel development:
Phase 6: Unified API Gateway¶
- Single FastAPI gateway aggregating all sub-services
- Authentication and rate limiting
- OpenAPI documentation consolidation
Phase 7: Topics Integration¶
- Topic extraction from crawled content
- Topic clustering and hierarchy
- Trend detection over time
- Related topic suggestions
Phase 8: Visual Diff Detection¶
- Compare screenshots over time
- Detect visual changes in web pages
- Integration with lazarus for historical comparisons
Phase 9: Production Hardening¶
- Add authentication to cluster API
- Systemd service templates for all components
- Monitoring and alerting integration
- Log aggregation
Notes¶
- All sub-services should follow the same patterns as
cbintel.cluster - Use Pydantic models for all data structures
- Async-first design for network operations
- Centralized configuration via
.env - Systemd services for production deployment