Day 13 – Networks Roasting on an Open Fire, Part 3: Feeling Warm and Looking Cool

by Geoffrey Broadwell

In parts 1 and 2 of these blog posts, I roughed out a simple ping chart program and then began to refactor and add features to improve the overall experience.

It’s functional, but there’s a lot to improve upon — it doesn’t use the screen real estate particularly well, there are some common network problems it can’t visualize, and frankly it just doesn’t look all that cool.

So let’s fix all that!

Another Dimension

A simple way to improve the chart’s overall information density is to encode more information into each rendered grid cell. Instead of always using the same glyph for every data point — providing no information other than its location — the shape, color, attributes, or pattern can be adjusted to show more useful information in each rendered cell of the screen.

The version of ping-chart in parts 1 and 2 only shows a relatively short history, since each grid column only represents at most one measurement (and possibly zero, if the “pong” reply packet was never received). Simply placing several measurements before moving on to the next column would improve that, but then overlaps become ambiguous. If ten measurements rendered as just three circles on the screen, how often did the measured ping times land on each of those circles? Were the measurements spread roughly evenly? Did most of them land on the highest or lowest circle?

The chart already looks a bit like a trail of bubbles or pebbles, so why not change the size of each pebble to indicate how often the measurement landed on a particular grid cell? There are many mappings usable for this, depending on which glyphs are available in the terminal font; here are a few obvious options:

ASCII:    . : o O 0
Latin-1:  · ° º o ö O Ö 0
WGL4:     · ◦ ∙ ●
Braille:  ⠄ ⠆ ⠦ ⠶ ⠷ ⠿ ⡿ ⣿

I’ll use ASCII for now, since every terminal font supports it. Making this work requires only a few changes to the update-chart sub. Instead of the original X coordinate calculation, I instead use:

    state    @counts;
    constant @glyphs = « ' ' . : o O 0 »;
    my $saturation   = @glyphs.end;
    my $x = ($id div $saturation) % $width + $x-offset;

This creates the @counts state variable to track how many measurements have landed on a particular Y coordinate and defines the glyphs to be used (including a leading space in the zero slot). The saturation point — the most measurements that can be recorded in a single column before moving forward — is calculated as the last glyph index (@glyphs.end), and finally the calculated X coordinate is (integer) divided by that saturation level to slow the horizontal movement appropriately.

Then I simply need to clear the @counts every time the chart moves to a new column:

        # Clear counts
        @counts = ();

And update the Y coordinate handling and glyph printing:

    # Calculate Y coord (from scaled ping time)
    # and cell content
    my $y =
      0 max $bottom - floor($time / $chart.ms-per-line);
    my $c = @glyphs[++@counts[$y]];

    # Draw glyph at (X, Y)
    $grid.print-cell($x, $y, $c);

The tiny change to the calculation of $y ensures that it can never be negative, and thus is always a valid array index. It’s then used to update the @counts, select the appropriate glyph based on the latest count, and print the chosen glyph in the right spot.

It looks like this now; instead of identical pebbles, the trail looks much more like various-sized pebbles on a scattering of sand:

ms│.    .              .   .             .         .
  │
  │
80│
  │
  │
  │
  │
60│
  │
  │
  │                                                                            .
  │
40│
  │                                .   .
  │     .        .   .                          .     .      ..
  │            .                           .  .                     .
  │  ... .           ..  .         :: .  .                .    ..       . ....
20│o:::.:.:.o ::: .. o.:: ...:..:.o:.:.O:.O:. ::::..oO. .:.....o.:..:oo:...o:.:.
  │.o:::.ooo:oo:oOOO. o.oOooOoOOoo: ::o o:.:O0.o:oOo:.o0OooOOoo.o:Oo:::ooOo.:ooo
  │    .   . :         .  .      .   .        .                  . .
  │
  │
 0│                 ^

It’s much easier to see where the most common measurements lie, and where the outliers are. As a bonus the change to force the Y coordinate onto the chart grid makes it now possible to see how often large outliers appear; they are no longer simply ignored when printing, but rather appear as a smattering of dots at the very top.

As there were five non-blank glyphs chosen, this version now shows five times as much history at once — a bit more than six minutes of history in a default width-80 terminal window.

A wider selection of @glyphs could further improve on that, but there are rapidly diminishing returns — too many different glyphs and the glanceability is lost because it becomes hard to tell the difference between them just by visual size or “density”. This is why I didn’t just choose the digits 1..9; there is very little density distinction between them all, and the overall effect is more confusing than enlightening.

Heating Up

Instead of changing the particular glyph drawn, we could also change its color and brightness; a bright line through a dark-background chart (or a dark line through a light-background chart) would then show where most ping times were clustered.

The original 16 color ANSI palette supported virtually everywhere is completely awful for this purpose, especially since every operating system and terminal program uses a different mapping for these colors. Thankfully there’s a better replacement: most modern terminal emulators support the xterm-256color extended colors and map them all equivalently.

These added colors are mapped in two blocks: a 6x6x6 RGB color cube and a 24-level gray scale. By choosing appropriate points on the color cube, I can create a decent heat map gradient:

# Calculate and convert the colormap once
constant @heatmap-colors =
   # Black to brick red
  (0,0,0), (1,0,0), (2,0,0), (3,0,0), (4,0,0),
  # Red to yellow-orange
  (5,0,0), (5,1,0), (5,2,0), (5,3,0), (5,4,0),
  # Bright to pale yellow
  (5,5,0), (5,5,1), (5,5,2), (5,5,3), (5,5,4),
  # White
  (5,5,5);

constant @heatmap-dark =
  @heatmap-colors.map: { ~(16 + 36 * .[0] + 6 * .[1] + .[2]) }

The formula used in the map converts a color cube RGB triple into a single color index in the range 16 to 231, which is what the terminal expects to see as a color specifier.

Another consideration is that subtly colored circles will probably be hard to distinguish; it would be clearer to just color the entire contents of each grid cell. The easiest way to do this is to set the cell background on a blank cell by using the on_prefix for the color and a blank space for the “glyph”.

Let’s look at the calculations of $saturation and $c again:

    my $saturation = @heatmap-dark.end;
    # ...
    my $c = 'on_' ~ @heatmap-dark[++@counts[$y]];

Modifying the call to print-cell allows setting the color:

    $grid.print-cell($x, $y, { char => ' ', color => $c });

Here’s what that looks like (as an image screenshot rather than a text capture now, in order to show the colors):

It’s beginning to look better, with a vague fiery look and a clear bright band where the ping times are concentrated. Furthermore with fifteen non-black colors in the map, this version of the program now has another three-fold history expansion over the five-glyph version in the previous section — almost 20 minutes of history across a default terminal window.

Precision Flames

While the heat map version has considerably improved information density horizontally, it’s done nothing to change the vertical density; the ping time resolution is just as bad now as it was in the very first version. And because terminal fonts usually make monospace character cells twice as tall as they are wide, the whole chart looks like it’s been smeared vertically. Time to fix that.

Around 2004 Microsoft and various type foundries standardized a list of standard glyphs that modern fonts should supply, called Windows Glyph List 4 or WGL4 for short. This standard was very well supported as a minimum subset for fonts (both free and proprietary) and its full character repertoire was later included in the first stable version of Unicode, cementing it as a solid compatibility baseline.

Among the many very useful glyphs in WGL4 (and thus Unicode 1.1) are the “half blocks”, which split each character cell in half either horizontally or vertically, displaying the foreground color on one half and the background color on the other half. Using the horizontal half blocks can effectively double the chart’s ping time resolution and simultaneously get rid of the vertical smearing effect.

This time all the changes occur in the last few lines of the update-chart sub, starting with a new Y calculation:

    # Calculate half-block resolution Y coord from
    # scaled ping time
    my $block-y = floor(2 * $time / $chart.ms-per-line);
    my $even-by = $block-y - $block-y % 2;
    my $y       = 0 max $bottom - $even-by div 2;
    @counts[$block-y]++;

    # Determine top and bottom counts for
    # half-block "pixels"
    my $c1 = @counts[$even-by + 1] // 0;
    my $c2 = @counts[$even-by]     // 0;

    # Create an appropriate colored cell, using half
    $ blocks if needed
    my $c = $c1 == $c2
      ?? $grid.cell(' ', 'on_'  ~ @heatmap-dark[$c1])
      !! $grid.cell('▀',          @heatmap-dark[$c1] ~
           ' on_' ~ @heatmap-dark[$c2]);

    # Draw colored cell at (X, Y)
    $grid.change-cell($x, $y, $c);
    $grid.print-cell($x, $y);

Since each grid cell now represents two half-block “pixels”, it’s necessary to keep track of both the per-half-block counts and the actual cell Y coordinate that a given block falls into. In addition since each cell could be generated as either one flat color or as two different colors, the code takes care to make an optimal custom grid cell, assign it with change-cell, and print it.

Here’s the result:

Much better — lots more detail and no distracting smearing effect.

Errors and Outages

While the half-block chart does a pretty good job showing network latency when the connection is relatively stable, it doesn’t do a good job of showing various errors: outages, individual lost packets, reordered packets, and so on. These can be detected by watching the sequence IDs carefully, and can be displayed on the top line of the chart to give a glanceable view of such problems.

First the error count for the current column must be kept as a new state variable and reset when the @counts are cleared for each new column:

    state ($errors, @counts);
    # ...

        # Clear counts and errors
        @counts = ();
        $errors = 0;

Then the previous sequence ID must be kept as well, and used to detect sequencing problems and gaps:

    # Determine if we've had any dropped packets or
    # sequence errors, while heuristically accounting
    # for sequence number wraparound
    state $prev-id = 0;
          $prev-id = -1 if $prev-id > $id + 0x7FFF;

    $errors += $id  >  $prev-id + 1
      ?? $id - ($prev-id + 1)
      !! $id  <= $prev-id
        ?? 1
        !! 0;
    $prev-id = $id max $prev-id;

It’s not perfect — it can certainly get confused by particularly horrid conditions — but the above algorithm for sequence tracking is similar to the one used by ping itself and should be resilient to many common problems.

If there are any errors, they should be marked after the normal ping time colors are drawn:

    # Mark errors if any
    if $errors {
        my $color = @heatmap-dark[$errors min $saturation];
        $grid.print-cell(
          $x, 0, {char => '╳', color => "black on_$color"}
        );
    }

This will indicate individual errors within a single column, but won’t show errors on skipped columns during an extended outage. To handle that, the first part of the code for moving to a new column needs some adjustment to update the error count and then mark the errors if any. Because the update-chart is now going to do the exact same error marking in two different places, it can be wrapped in a private helper sub called where needed:

    # Helper routine for marking errors for a
    # particular column
    my sub mark-errors($x, $errors) {
        if $errors {
            my $color =
              @heatmap-dark[$errors min $saturation];
            $grid.print-cell($x, 0, {
              char => '╳', color => "black on_$color"
            });
        }
    }

    # ...

        # If there was a *valid* previous column,
        # finish it off
        if $prev-x >= $x-offset {
            # Missing packets are counted as errors
            $errors += 0 max $saturation - @counts.sum;
            mark-errors($prev-x, $errors);

            # Remove the old "current column" marker
            # if it hasn't been overdrawn
            $grid.print-cell($prev-x, $bottom, ' ')
              if $grid.grid[$bottom][$prev-x] eq '^';
        }

    # ...

    # Mark errors if any
    mark-errors($x, $errors);

Here’s what an outage of a little less than a minute looks like now, showing a bright error bar on the top of the chart during the outage:

One Last Bug

There is a remaining subtle bug in the handling of long ping times. Counts are adjusted individually for each (quantized) ping time seen, but long times could map to a quantization bucket arbitrarily far off the top of the chart. Given only fifteen chances in each column, it’s unlikely that any two overlong times will map to the same (off screen) @counts bucket. So even though $y is forced onto the chart before printing, it will likely only ever show the darkest red color even if several very long pings were measured in that column.

To fix this, all of the overlong pings should be counted in a single bucket and be displayed appropriately in the top chart row. As with $errors, let’s track the number of overlong pings in a given column with a new state variable, and reset it when moving to a new column:

    state ($errors, $over, @counts);
    # ...

        # Clear counts and errors
        @counts = ();
        $errors = 0;
        $over   = 0;

Then it’s simply a matter of special-casing the top row when drawing the ping time results:

    my $c = $y  <= 0
      ?? $grid.cell('▲', @heatmap-dark[++$over])
      !! $c1 == $c2
        ?? $grid.cell(' ', 'on_'  ~ @heatmap-dark[$c1])
        !! $grid.cell('▀',          @heatmap-dark[$c1] ~
             ' on_' ~ @heatmap-dark[$c2]);

Since this happens before the call to mark-errors, an actual error in a given column will replace any overlong mark that was already there. This is intentional: The top row of the chart is used for “problems”, lost packets have a worse effect on user experience than slow replies, and there’s not enough value in using the top two rows of the screen to separate the two problem types visually.

Here’s the final result, my network roasting on an open fire when the ping time variance has got rather bad for a while:

A Final Present

If you’ve made it this far, I’ve got one last little trick for you. You can change the window title by printing a short escape sequence, so that it’s easier to identify in the giant mess of windows on the typical desktop. (What, you’re going to try to claim your desktop doesn’t have dozens of windows open? Mine certainly does!)

Just add this right after initializing Terminal::Print in MAIN:

    # Set window title
    my $title = "$target - ping-chart";
    print "\e]2;$title\e\\";

And that’s it! Happy holidays to all!

Appendix: (Possibly) Frequently Asked Question

But, but … color theory! The color map isn’t perceptually even!

Well yes, I did gloss over that a bit didn’t I?

In terms of perceptual distance between colors, it would seem much better to jump directly from bright yellow to white without the fine details of light yellows. The perceptual differences in light yellows are much less obvious than in the reds and oranges, so a color palette made from the heat map above appears to have 10 clearly different reds and oranges, and then a subtly-varying “smear” of light yellow. Jumping directly from bright yellow to white sets the two apart decently (though still not as much as the reds and oranges), so the color swatches would look more evenly spaced.

However in practice such a “corrected” map looks worse for the actual ping chart! Ping time jitter makes it unlikely that the top couple colors in the map will actually be shown, as that would require (nearly) every ping time to map to the very same pixel, and thus absolutely rock steady network performance. Aside from perhaps the loopback interface (the computer talking to itself), this is rather unlikely in actual practice. Thus to be able to produce the characteristic bright band of a mostly steady connection, the lightest colors need to be over-emphasized in the color map, which it happens the smooth walk through the light yellows achieves nicely.