Performance Tuning
VisiData is fast by default, but very large files (millions of rows) and complex derived columns can slow it down. This lesson covers the key levers for improving performance.
The most impactful optimizations are: --max-rows for sampling, options.load_lazy for deferred loading, and using TSV instead of CSV for faster parsing.
Row Limit for Exploration
The single most effective optimization when exploring large files:
# Load only first 100,000 rows
vd --max-rows 100000 large_file.csv
# Or set globally in ~/.visidatarc
options.max_rows = 1000000 # 1 million (0 = unlimited)
This does not modify the file — it just limits the in-memory dataset.
Lazy Loading of Subsheets
# ~/.visidatarc
options.load_lazy = True
# Subsheets (frequency tables, describe sheets) load only when accessed
Without lazy loading, VisiData preloads all referenced subsheets. For deeply nested data (JSON with many arrays), this can be slow.
Format Performance Comparison
| Format | Relative Load Speed | Notes |
|---|---|---|
| TSV | ⚡ Fastest | No quoting logic |
| CSV | ⚡ Fast | Standard |
| JSON (small) | ✅ Fast | Single object |
| JSONL | ✅ Fast | One object per line |
| JSON (large nested) | ⚠️ Slower | Deep nesting expands columns |
| Excel (XLSX) | ⚠️ Slower | Requires openpyxl |
| Parquet | ✅ Fast for large files | Columnar — great for analytics |
| SQLite | ✅ Fast | Loads table on Enter |
Recommendation: Pre-convert Excel to CSV/TSV with vd -b file.xlsx -o file.csv before interactive analysis.
Column Cache
For derived columns with expensive computations:
# ~/.visidatarc
options.col_cache_size = 256 # cache last 256 rows' computed values
This is useful when expressions involve regex, Python function calls, or string parsing that runs on every scroll.
Memory Management
# ~/.visidatarc
# Stop loading if free memory drops below N MB
options.min_memory_mb = 200
VisiData will pause loading and show a warning when memory is low.
Profiling Slow Operations
# ~/.visidatarc or CLI
options.profile = True
# Enables Python profiling on background threads
# View results: Ctrl+T (Threads Sheet)
Async Thread Management
VisiData loads data in background threads. Inspect and control them:
Ctrl+T # open Threads Sheet (view all async threads)
Ctrl+C # cancel current user input or abort threads on current sheet
g Ctrl+C # abort ALL secondary threads
Fastest Workflow for Large Logs
# 1. Sample first
vd --max-rows 10000 /var/log/nginx/access.log
# 2. Learn the format — develop your regex
;
# Enter: regex...
# 3. Test your workflow on the sample
# 4. Remove max-rows limit for full analysis
vd /var/log/nginx/access.log # re-open without limit
Batch Conversion Performance
For batch conversion of large files, use -b (batch mode) which skips the TUI and processes faster:
# Much faster than interactive for pure conversion
time vd -b large.csv -o large.json
# Parallel batch conversion (multiple files)
ls *.csv | parallel vd -b {} -o {.}.json
Practical Benchmark Reference
On a modern Linux VPS (4 vCPU, 8GB RAM):
| File size | Rows | Format | Load time |
|---|---|---|---|
| 10 MB | 200,000 | CSV | ~1 second |
| 100 MB | 2,000,000 | CSV | ~8 seconds |
| 500 MB | 10,000,000 | CSV | ~40 seconds |
| 100 MB | 2,000,000 | TSV | ~5 seconds |
| 100 MB | — | Parquet | ~2 seconds |
Use --max-rows 500000 to get instant interactive response on any size file.
Troubleshooting Slow Performance
| Symptom | Cause | Fix |
|---|---|---|
| Slow to open | Too many rows | --max-rows 100000 |
| Derived column slow to scroll | Expensive expression | Enable col_cache_size = 256 |
| Frequency table slow | Many distinct values | Sample first with " on subset |
| JSON expansion hangs | Deep nested structure | Avoid g( on deeply nested JSON |
| Memory warning shown | Low system RAM | options.min_memory_mb = 50 |
Hands-On Practice
# Generate a large CSV for testing
python3 -c "
import random, string
print('id,name,value,status')
for i in range(500000):
print(f'{i},{random.choice([\"Alice\",\"Bob\",\"Carol\"])},{random.randint(1,1000)},{random.choice([\"active\",\"inactive\"])}')
" > /tmp/large.csv
# Open with limit first
time vd --max-rows 10000 /tmp/large.csv
# Compare with full load
time vd /tmp/large.csv
# Use Ctrl+C to cancel if too slow
# Optimal: sample → analyze → full run