Skip to content

cbmodels CLI Reference

Complete command reference for the cbmodels data analysis and ETL toolkit.

Quick Start

# Activate environment
source ~/.pyenv/versions/nominates/bin/activate

# Test Snowflake connection
cbmodels extract test

# Run autonomous pipeline (extract + consolidate)
cbmodels pipeline run CONSERVATIVECONNECTOR.PIPELINE.MY_TABLE

# Check pipeline status
cbmodels pipeline status CONSERVATIVECONNECTOR.PIPELINE.MY_TABLE

# Load parquet into DuckDB
cbmodels ingest load data/cc/sources/my_table_20251226.parquet --db analysis.db

Pipeline Commands (Autonomous ETL)

The pipeline automates: Extract (chunked) → Consolidate → Load

cbmodels pipeline run

Run the full autonomous ETL pipeline.

cbmodels pipeline run TABLE [OPTIONS]

# Examples:
# Extract and consolidate only (no DuckDB loading)
cbmodels pipeline run CONSERVATIVECONNECTOR.KPI.CAMPAIGNS

# Full pipeline with DuckDB loading
cbmodels pipeline run CONSERVATIVECONNECTOR.KPI.CAMPAIGNS --db analysis.db -t campaigns

# Large table with custom chunk size
cbmodels pipeline run CONSERVATIVECONNECTOR.PIPELINE.BIG_TABLE --chunk-size 500000

# Start fresh (ignore previous progress)
cbmodels pipeline run CONSERVATIVECONNECTOR.KPI.MY_TABLE --no-resume
Option Description Default
TABLE Fully qualified Snowflake table name Required
--db Target DuckDB database (skip if not provided) None
-t, --table Target table name in DuckDB Source table name
--chunk-size Rows per extraction chunk 1,000,000
--work-dir Directory for temp chunk files ./data/cc/work
--output-dir Directory for consolidated parquet ./data/cc/sources
--no-resume Start fresh, ignore previous state False
--env Path to .env file with credentials .env

Output: {output-dir}/{table_name}_{YYYYMMDD}.parquet

cbmodels pipeline status

Check the current state of a pipeline.

cbmodels pipeline status TABLE

# Example:
cbmodels pipeline status CONSERVATIVECONNECTOR.KPI.EMAIL_CLICKERS

Shows: status, total rows, extracted rows, chunks, consolidated flag, loaded rows.

cbmodels pipeline list

List all pipeline states.

cbmodels pipeline list

cbmodels pipeline cleanup

Remove work files (chunks) after successful extraction.

cbmodels pipeline cleanup TABLE [--keep-output]

# Example: Clean up chunks but keep the consolidated parquet
cbmodels pipeline cleanup CONSERVATIVECONNECTOR.KPI.MY_TABLE

cbmodels pipeline preview

Dry-run: show row count and time estimates without extracting.

cbmodels pipeline preview TABLE

# Example:
cbmodels pipeline preview CONSERVATIVECONNECTOR.KPI.CAMPAIGNS

Output shows: rows, columns, estimated chunks, time, and file size.

cbmodels pipeline sources

List all extracted parquet files with sizes and dates.

cbmodels pipeline sources

Extract Commands (Manual Extraction)

For manual/ad-hoc extraction without the full pipeline.

cbmodels extract test

Test Snowflake connection.

cbmodels extract test

cbmodels extract tables

List available tables in a Snowflake schema.

cbmodels extract tables [-s SCHEMA]

# Examples:
cbmodels extract tables
cbmodels extract tables -s PIPELINE
cbmodels extract tables -s KPI

cbmodels extract snowflake

Manual extraction (single query, no chunking).

# Extract a table with row limit
cbmodels extract snowflake -t PIPELINE.MY_TABLE -l 1000 -o ./data/extracts

# Run a custom query
cbmodels extract snowflake -q "SELECT * FROM table WHERE date > '2024-01-01'" -n my_extract
Option Description
-t, --table Table to extract
-q, --query Custom SQL query
-o, --output Output directory
-n, --name Output file name
-f, --format Format: parquet, json, csv
-l, --limit Row limit

Ingest Commands (DuckDB Loading)

Load parquet files into DuckDB.

cbmodels ingest load

Quick load a parquet file directly into DuckDB.

cbmodels ingest load PARQUET_FILE [OPTIONS]

# Examples:
cbmodels ingest load data/cc/sources/campaigns_20251226.parquet --db analysis.db -t campaigns

# Load with table drop
cbmodels ingest load data.parquet --db mydb.db --drop
Option Description Default
--db Target DuckDB database analysis.db
-t, --table Target table name File stem
--drop/--no-drop Drop existing table True

cbmodels ingest analyze

Analyze a parquet file and generate a mapping spec (for complex transforms).

cbmodels ingest analyze data.parquet -o spec.json --table my_data

cbmodels ingest run

Run ingestion with a mapping spec (for complex ETL with transforms).

cbmodels ingest run spec.json data.parquet --drop

Analysis Commands (Model Building)

Build and query data models from DuckDB databases.

cbmodels build

Build a data model with correlations, outliers, and patterns.

cbmodels build DATABASE.db -o model.json

cbmodels info

Display model summary.

cbmodels info model.json

cbmodels tables

List tables in the model.

cbmodels tables model.json

cbmodels schema

Display database schema.

cbmodels schema model.json [-t TABLE]

cbmodels correlations

Show correlations between numeric columns.

cbmodels correlations model.json --min 0.3 [-t TABLE]

cbmodels outliers

Show detected outliers.

cbmodels outliers model.json [-t TABLE]

cbmodels patterns

Show detected patterns (missing data, imbalance, skew).

cbmodels patterns model.json [-p TYPE] [-t TABLE]

cbmodels stats

Show detailed statistics for a table/column.

cbmodels stats model.json TABLE [-c COLUMN]

Environment Setup

Required Environment Variables

Create a .env file in the project root:

SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=CONSERVATIVECONNECTOR
SNOWFLAKE_SCHEMA=PIPELINE

Directory Structure

data/cc/
├── work/                    # Temporary chunk files (auto-cleaned)
│   └── {table_name}/
│       ├── chunk_0000.parquet
│       ├── chunk_0001.parquet
│       └── ...
├── sources/                 # Consolidated parquet files
│   ├── campaigns_20251226.parquet
│   ├── email_clickers_20251226.parquet
│   └── list_link_phone_email_map_20251226.parquet
└── databases/               # DuckDB databases
    └── analysis.db

Typical Workflows

1. Extract a New Table

# Run pipeline (autonomous)
cbmodels pipeline run CONSERVATIVECONNECTOR.PIPELINE.NEW_TABLE

# Monitor progress
cbmodels pipeline status CONSERVATIVECONNECTOR.PIPELINE.NEW_TABLE

# Clean up work files when done
cbmodels pipeline cleanup CONSERVATIVECONNECTOR.PIPELINE.NEW_TABLE

2. Load Multiple Tables into DuckDB

# Load each parquet file
cbmodels ingest load data/cc/sources/campaigns_20251226.parquet --db analysis.db -t campaigns
cbmodels ingest load data/cc/sources/email_clickers_20251226.parquet --db analysis.db -t email_clickers
cbmodels ingest load data/cc/sources/donations_20251226.parquet --db analysis.db -t donations

3. Build Analysis Model

# Build model from DuckDB
cbmodels build analysis.db -o model.json

# Explore results
cbmodels tables model.json
cbmodels correlations model.json --min 0.5
cbmodels patterns model.json

4. Resume Failed Pipeline

# Pipeline auto-resumes from last checkpoint
cbmodels pipeline run CONSERVATIVECONNECTOR.PIPELINE.BIG_TABLE

# Check where it left off
cbmodels pipeline status CONSERVATIVECONNECTOR.PIPELINE.BIG_TABLE

Troubleshooting

Connection Timeouts

For very large tables, the pipeline automatically: - Retries failed chunks (up to 5 times) - Reconnects after network errors - Uses 10-minute network/socket timeouts

If timeouts persist:

# Use smaller chunks
cbmodels pipeline run TABLE --chunk-size 500000

Schema Mismatches

The consolidation phase automatically normalizes schemas: - Date columns → String (for consistency across chunks) - Handles type variations between extraction batches

Disk Space

Work files are stored in ./data/cc/work/. Clean up after successful extraction:

cbmodels pipeline cleanup TABLE