Skip to main content

Analysis Workflows Cheatsheet

Repeatable, step-by-step analysis patterns for the most common VisiData use cases.

Workflow 1: Frequency Distribution

Goal: Count how many rows exist for each value in a column.

1. Open file: vd data.csv
2. Move to column: h/l or arrow keys
3. Cast if needed: # (int) or ~ (str)
4. Open freq table: Shift+F
5. Sort by count: ] (descending)
6. Drill into group: Enter
7. Return: q

Workflow 2: Pivot Table (Cross-Tabulation)

Goal: Sum/mean of a value column grouped by two categorical columns.

1. Open file: vd data.csv
2. Cast value column: % or #
3. Add aggregator: + → enter: sum
4. Mark row key: ! on first category column
5. Mark col key: ! on second category column
6. Move to value col: h/l
7. Open pivot: Shift+W
8. Explore results: arrow keys, Enter for drill-down
9. Return: q

Workflow 3: Data Profiling (Instant Audit)

Goal: See nulls, distinct counts, min/max for all columns.

1. Open file: vd data.csv
2. Open Describe: Shift+I
3. Read: nulls column → which columns have missing data?
4. Read: distinct → is a column constant? (distinct=1 → useless)
5. Read: min/max → spot outliers
6. Drill into column: Enter → opens frequency table for that column
7. Return: q

Workflow 4: Filter and Export

Goal: Extract a subset of rows matching a condition.

1. Open file: vd data.csv
2. Select matching: | regex (by regex in current column)
z| expr (by Python expression)
, (by matching current cell value)
3. Verify count: check status bar (N rows selected)
4. Open filtered: "
5. Save filtered: Ctrl+S → new_filename.csv
6. Return: q

Workflow 5: Log Parsing

Goal: Parse unstructured log lines into columns.

1. Open log: vd /var/log/nginx/access.log
2. Parse with regex: ; (semicolon)
Enter: (?P<ip>\S+) \S+ \S+ \[(?P<date>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) \S+" (?P<status>\d+) (?P<bytes>\d+)
3. Type columns: Move to status → #
Move to bytes → #
Move to date → @
4. Analyze: Shift+F on status → distribution
5. Drill into 500s: Enter on 500 row in freq table
6. Export: Ctrl+S → report.csv

Workflow 6: Data Cleaning

Goal: Clean a messy CSV for import or analysis.

1. Open file: vd raw_data.csv
2. Inspect: Shift+I (Describe Sheet)
3. Fix types: # % @ on numeric/date columns
4. Fill nulls: f on columns with propagating values
ge "default" on columns needing a fixed fill
5. Normalize text: gs → g* → (?i)ACTIVE<Tab>active
6. Remove invalids: z| expr → gd
7. Check duplicates: Shift+F on key column → look for count > 1
8. Verify: Shift+I again → confirm nulls=0, types correct
9. Export: Ctrl+S → clean_data.csv

Workflow 7: Sysadmin Process Inspection

Goal: Interactively inspect running processes.

1. Launch: ps aux | vd -f fixed --skip 1
2. Type CPU: % on %CPU column
3. Type MEM: % on %MEM column
4. Sort by CPU: ] (descending)
5. Filter high-CPU: z| → CPU > 5
6. Open filtered: "
7. Inspect PIDs: note the PID column values

Workflow 8: Batch File Conversion

Goal: Convert data format without interactive UI.

# CSV → JSON
vd -b input.csv -o output.json

# JSONL → CSV
vd -b events.jsonl -o events.csv

# Excel → TSV
vd -b report.xlsx -o report.tsv

# With row limit (sample conversion)
vd -b --max-rows 10000 input.csv -o sample.json

# Replay a recorded session on new input
vd --play clean_workflow.vdj --batch new_input.csv -o cleaned.csv

Workflow 9: Join Two Datasets

Goal: Enrich one dataset with columns from another.

1. Open both files: vd file1.csv file2.csv
2. Set key on file1: ! on 'id' column
3. Set key on file2: ! on 'id' column (switch with Ctrl+^)
4. Open Sheets Sheet: Shift+S
5. Select both: s on each sheet row
6. Join: & → choose jointype (inner/outer/full)
7. Result sheet opens: verify row count and columns
8. Save: Ctrl+S → joined.csv

Workflow 10: CommandLog Replay (Reproducible Analysis)

Goal: Save and replay an analysis for documentation or automation.

During interactive session:
1. Perform all your analysis steps
2. Open CommandLog: Shift+D
3. Save: Ctrl+S → workflow.vdj

Replay later:
vd --play workflow.vdj # interactive replay
vd --play workflow.vdj --batch -o result.csv # batch replay
vd --play workflow.vdj --replay-wait 2 # demo (2s between steps)

Quick Analysis Decision Matrix

What you needVisiData command
Count by categoryShift+F (frequency table)
Sum/mean by category+ aggregator + Shift+F
Cross-tab two categorieskey columns ! + Shift+W
All column statisticsShift+I (Describe Sheet)
Filter to matching rows`
Plot a numeric column. (after setting key column !)
Compare two variableskey=x !, y column, . (scatterplot)
Join two filesShift+S → select both → &
Convert file formatvd -b input.ext -o output.ext
Parse log fields; with named capture regex

Common Command Sequences

# Open → frequency → drill → export
vd f.csv → Shift+F → ] → Enter → Ctrl+S

# Open → type → pivot → export
vd f.csv → # on value → + sum → ! on keys → Shift+W → Ctrl+S

# Open → parse log → analyze → export
vd log → ; regex → # on status → Shift+F → | 500 → " → Ctrl+S

# Open → filter → clean → export
vd f.csv → Shift+I → # % @ → f → gs → g* → gd → Ctrl+S

Next Steps

You have completed the VisiData curriculum. Return to any module for deeper practice: