Skip to main content

Scatterplots and Histograms

VisiData renders scatterplots by plotting one numeric column against another (both set as key and value), and histograms through the frequency table's built-in histogram column. Both run entirely in the terminal with no external libraries.

Learning Focus

Scatterplots reveal correlation between two variables. Histograms reveal distribution shape. Learn when each is more informative than a simple frequency table.

Scatterplots

Two Numeric Variables

# Mark x-axis column as key (numeric)
# Move to 'response_time' column, cast to int: #
! # key column (x-axis)

# Move to 'bytes_sent' column, cast to int: #
. # plot bytes_sent vs response_time

When the key column is numeric (not categorical), VisiData renders a true scatterplot.

Color-Coded by Category

When a categorical key column is set alongside a numeric key column, VisiData assigns distinct colors to each category:

# Set categorical key column: e.g., 'method' (GET/POST/PUT)
# Move to 'method' column
! # categorical key

# Set numeric key column: e.g., 'response_time'
# Move to 'response_time' column
! # numeric key (x-axis)

# Move to 'bytes_sent' column
. # scatterplot: x=response_time, color=method

Each HTTP method (GET, POST, PUT) appears in a distinct color on the canvas.

Histograms from Frequency Tables

The frequency table includes a built-in histogram column using characters:

# Open frequency table on any column
Shift+F

# The table includes a 'histogram' column automatically
# This is a text-based bar chart

# To get a canvas-based histogram:
# Move cursor to 'count' column in the frequency table
. # canvas graph of frequency counts

Configuring the Canvas

# Set x range manually
x
# Enter: 0 1000 (xmin xmax)

# Set y range manually
y
# Enter: 0 5000 (ymin ymax)

# Reset to auto-fit
_ # zoom to fit full extent

Practical Use Cases

Correlation: Response Time vs. Bytes Sent

vd /var/log/nginx/access.log

# Cast 'response_time' to int: #, mark as key: !
# Cast 'bytes_sent' to int: #
.
# Scatterplot: do larger responses take longer? Look for a trend.

Distribution of HTTP Status Codes

vd /var/log/nginx/access.log

# Move to 'status' column
Shift+F
# Frequency table shows text histogram:
# 200 ████████████████ 8500
# 301 ████ 400
# 404 ██ 150
# 500 ■ 12

CPU Load vs Memory Usage

vd /var/log/system_metrics.csv

# Cast 'cpu_percent' to float: %, mark as key: !
# Cast 'mem_percent' to float: %
.
# Scatterplot showing correlation between CPU and memory load

Histogram of Order Values

vd /var/www/html/exports/orders.csv

# Cast 'amount' to float: %
Shift+F
# Frequency table of order amounts with histogram
# Press ] to sort by count descending — see most common order value ranges

Canvas Layer Toggling

Inside the canvas, toggle display of individual plot layers:

1 toggle layer 1 (first plotted column/category)
2 toggle layer 2
...
9 toggle layer 9

Useful when g. plots many overlapping columns — disable layers one at a time to isolate signals.

Reading the Canvas

· or ⠁⠂⠄⠈ sparse data points (Braille cells with few dots)
⠿ or ⣿ dense cluster of many data points in one cell
color distinct categorical values (when categorical key is set)
─ │ axis lines
↑ → labels axis labels and scale ticks

Troubleshooting Matrix

ProblemCauseFix
Scatterplot looks like vertical linesX-axis column is categoricalUse a numeric column as the key
Canvas all one colorNo categorical key setMark a categorical column as an additional key
Points too sparse to seeData range too wideUse x and y to set axis ranges
Canvas renders slowlyVery many distinct pointsFilter to a sample: --max-rows 10000
histogram too small to readTerminal too narrowWiden terminal or use canvas graph

Best Practices

  • Use scatterplots for correlation discovery between two numeric variables.
  • Use histograms (frequency table) for distribution shape of a single variable.
  • Use g. to plot multiple numeric columns simultaneously for multi-variable overview.
  • Set axis ranges manually (x and y) when outliers compress the visible range.

Hands-On Practice

cat > /tmp/requests.csv << 'EOF'
response_ms,bytes_sent,method,status
120,1200,GET,200
350,45000,GET,200
80,800,POST,201
2100,200,GET,500
95,1100,GET,200
430,52000,POST,201
55,600,GET,304
1800,150,GET,500
EOF

vd /tmp/requests.csv

# 1. Cast response_ms to int: #, mark as key: !
# 2. Cast bytes_sent to int: #
# 3. Press . → scatterplot: bytes vs response time
# 4. Press q → return
# 5. Move to 'method' column
# 6. Press Shift+F → histogram of methods
# 7. Press . on count column → canvas histogram

What's Next