Performance Tuning

VisiData is fast by default, but very large files (millions of rows) and complex derived columns can slow it down. This lesson covers the key levers for improving performance.

Learning Focus

The most impactful optimizations are: --max-rows for sampling, options.load_lazy for deferred loading, and using TSV instead of CSV for faster parsing.

Row Limit for Exploration

The single most effective optimization when exploring large files:

# Load only first 100,000 rows
vd --max-rows 100000 large_file.csv

# Or set globally in ~/.visidatarc
options.max_rows = 1000000   # 1 million (0 = unlimited)

This does not modify the file — it just limits the in-memory dataset.

Lazy Loading of Subsheets

# ~/.visidatarc
options.load_lazy = True
# Subsheets (frequency tables, describe sheets) load only when accessed

Without lazy loading, VisiData preloads all referenced subsheets. For deeply nested data (JSON with many arrays), this can be slow.

Format Performance Comparison

Format	Relative Load Speed	Notes
TSV	⚡ Fastest	No quoting logic
CSV	⚡ Fast	Standard
JSON (small)	✅ Fast	Single object
JSONL	✅ Fast	One object per line
JSON (large nested)	⚠️ Slower	Deep nesting expands columns
Excel (XLSX)	⚠️ Slower	Requires openpyxl
Parquet	✅ Fast for large files	Columnar — great for analytics
SQLite	✅ Fast	Loads table on Enter

Recommendation: Pre-convert Excel to CSV/TSV with vd -b file.xlsx -o file.csv before interactive analysis.

Column Cache

For derived columns with expensive computations:

# ~/.visidatarc
options.col_cache_size = 256   # cache last 256 rows' computed values

This is useful when expressions involve regex, Python function calls, or string parsing that runs on every scroll.

Memory Management

# ~/.visidatarc
# Stop loading if free memory drops below N MB
options.min_memory_mb = 200

VisiData will pause loading and show a warning when memory is low.

Profiling Slow Operations

# ~/.visidatarc or CLI
options.profile = True
# Enables Python profiling on background threads
# View results: Ctrl+T (Threads Sheet)

Async Thread Management

VisiData loads data in background threads. Inspect and control them:

Ctrl+T          # open Threads Sheet (view all async threads)
Ctrl+C          # cancel current user input or abort threads on current sheet
g Ctrl+C        # abort ALL secondary threads

Fastest Workflow for Large Logs

# 1. Sample first
vd --max-rows 10000 /var/log/nginx/access.log

# 2. Learn the format — develop your regex
;
# Enter: regex...

# 3. Test your workflow on the sample

# 4. Remove max-rows limit for full analysis
vd /var/log/nginx/access.log   # re-open without limit

Batch Conversion Performance

For batch conversion of large files, use -b (batch mode) which skips the TUI and processes faster:

# Much faster than interactive for pure conversion
time vd -b large.csv -o large.json

# Parallel batch conversion (multiple files)
ls *.csv | parallel vd -b {} -o {.}.json

Practical Benchmark Reference

On a modern Linux VPS (4 vCPU, 8GB RAM):

File size	Rows	Format	Load time
10 MB	200,000	CSV	~1 second
100 MB	2,000,000	CSV	~8 seconds
500 MB	10,000,000	CSV	~40 seconds
100 MB	2,000,000	TSV	~5 seconds
100 MB	—	Parquet	~2 seconds

Use --max-rows 500000 to get instant interactive response on any size file.

Troubleshooting Slow Performance

Symptom	Cause	Fix
Slow to open	Too many rows	`--max-rows 100000`
Derived column slow to scroll	Expensive expression	Enable `col_cache_size = 256`
Frequency table slow	Many distinct values	Sample first with `"` on subset
JSON expansion hangs	Deep nested structure	Avoid `g(` on deeply nested JSON
Memory warning shown	Low system RAM	`options.min_memory_mb = 50`

Hands-On Practice

# Generate a large CSV for testing
python3 -c "
import random, string
print('id,name,value,status')
for i in range(500000):
    print(f'{i},{random.choice([\"Alice\",\"Bob\",\"Carol\"])},{random.randint(1,1000)},{random.choice([\"active\",\"inactive\"])}')
" > /tmp/large.csv

# Open with limit first
time vd --max-rows 10000 /tmp/large.csv

# Compare with full load
time vd /tmp/large.csv
# Use Ctrl+C to cancel if too slow

# Optimal: sample → analyze → full run

Row Limit for Exploration​

Lazy Loading of Subsheets​

Format Performance Comparison​

Column Cache​

Memory Management​

Profiling Slow Operations​

Async Thread Management​

Fastest Workflow for Large Logs​

Batch Conversion Performance​

Practical Benchmark Reference​

Troubleshooting Slow Performance​

Hands-On Practice​

What's Next​