Skip to main content

Performance Tuning

VisiData is fast by default, but very large files (millions of rows) and complex derived columns can slow it down. This lesson covers the key levers for improving performance.

Learning Focus

The most impactful optimizations are: --max-rows for sampling, options.load_lazy for deferred loading, and using TSV instead of CSV for faster parsing.

Row Limit for Exploration

The single most effective optimization when exploring large files:

# Load only first 100,000 rows
vd --max-rows 100000 large_file.csv

# Or set globally in ~/.visidatarc
options.max_rows = 1000000 # 1 million (0 = unlimited)

This does not modify the file — it just limits the in-memory dataset.

Lazy Loading of Subsheets

# ~/.visidatarc
options.load_lazy = True
# Subsheets (frequency tables, describe sheets) load only when accessed

Without lazy loading, VisiData preloads all referenced subsheets. For deeply nested data (JSON with many arrays), this can be slow.

Format Performance Comparison

FormatRelative Load SpeedNotes
TSV⚡ FastestNo quoting logic
CSV⚡ FastStandard
JSON (small)✅ FastSingle object
JSONL✅ FastOne object per line
JSON (large nested)⚠️ SlowerDeep nesting expands columns
Excel (XLSX)⚠️ SlowerRequires openpyxl
Parquet✅ Fast for large filesColumnar — great for analytics
SQLite✅ FastLoads table on Enter

Recommendation: Pre-convert Excel to CSV/TSV with vd -b file.xlsx -o file.csv before interactive analysis.

Column Cache

For derived columns with expensive computations:

# ~/.visidatarc
options.col_cache_size = 256 # cache last 256 rows' computed values

This is useful when expressions involve regex, Python function calls, or string parsing that runs on every scroll.

Memory Management

# ~/.visidatarc
# Stop loading if free memory drops below N MB
options.min_memory_mb = 200

VisiData will pause loading and show a warning when memory is low.

Profiling Slow Operations

# ~/.visidatarc or CLI
options.profile = True
# Enables Python profiling on background threads
# View results: Ctrl+T (Threads Sheet)

Async Thread Management

VisiData loads data in background threads. Inspect and control them:

Ctrl+T # open Threads Sheet (view all async threads)
Ctrl+C # cancel current user input or abort threads on current sheet
g Ctrl+C # abort ALL secondary threads

Fastest Workflow for Large Logs

# 1. Sample first
vd --max-rows 10000 /var/log/nginx/access.log

# 2. Learn the format — develop your regex
;
# Enter: regex...

# 3. Test your workflow on the sample

# 4. Remove max-rows limit for full analysis
vd /var/log/nginx/access.log # re-open without limit

Batch Conversion Performance

For batch conversion of large files, use -b (batch mode) which skips the TUI and processes faster:

# Much faster than interactive for pure conversion
time vd -b large.csv -o large.json

# Parallel batch conversion (multiple files)
ls *.csv | parallel vd -b {} -o {.}.json

Practical Benchmark Reference

On a modern Linux VPS (4 vCPU, 8GB RAM):

File sizeRowsFormatLoad time
10 MB200,000CSV~1 second
100 MB2,000,000CSV~8 seconds
500 MB10,000,000CSV~40 seconds
100 MB2,000,000TSV~5 seconds
100 MBParquet~2 seconds

Use --max-rows 500000 to get instant interactive response on any size file.

Troubleshooting Slow Performance

SymptomCauseFix
Slow to openToo many rows--max-rows 100000
Derived column slow to scrollExpensive expressionEnable col_cache_size = 256
Frequency table slowMany distinct valuesSample first with " on subset
JSON expansion hangsDeep nested structureAvoid g( on deeply nested JSON
Memory warning shownLow system RAMoptions.min_memory_mb = 50

Hands-On Practice

# Generate a large CSV for testing
python3 -c "
import random, string
print('id,name,value,status')
for i in range(500000):
print(f'{i},{random.choice([\"Alice\",\"Bob\",\"Carol\"])},{random.randint(1,1000)},{random.choice([\"active\",\"inactive\"])}')
" > /tmp/large.csv

# Open with limit first
time vd --max-rows 10000 /tmp/large.csv

# Compare with full load
time vd /tmp/large.csv
# Use Ctrl+C to cancel if too slow

# Optimal: sample → analyze → full run

What's Next