Day 9 – Networks Roasting on an Open Fire, Part 1: Whipuptitude

by Geoffrey Broadwell

My home Internet connection is less than ideal. On the good days it’s fine I suppose, but on the bad days — and there are a lot of them — well, my ISP seems to be doing its darnedest to be earning coal in its collective stockings. Meanwhile I hear shouts across the house of “DAAAD, the INTERNEEEET!” and have to diagnose yet again what’s causing the fuss.

Experience has shown that my ISP is easily overwhelmed by weekend and holiday traffic levels, but it degrades in all sorts of interesting ways:

  • High latency
  • High jitter
  • Reduced bandwidth
  • Reordered packets
  • Lost packets
  • Intermittent connectivity

Some of these are easy to work around: “Kids, stop streaming extra stuff you’re not actually paying attention to, and kindly save the giant game updates for after peak hours!” Other problems (the last two especially) are miserable for everyone no matter how you slice it, and much more difficult to work around.

This year I decided to whip up a little display to let me know at a glance exactly how our Internet connection was failing, without having to investigate from scratch each time. First I needed a tool to measure with, something that could produce a stream of data I could analyze to produce the glanceable display. Thankfully such a tool is quite easy to find.

Internet Ping Pong

Since the days of yore, just about every Internet-connected system has come with a simple utility called ping, which as its name implies acts a bit like a unidirectional sonar. It sends out a small “ping” packet and looks for a “pong” response; if there is one, it reports the time taken to send and receive. By default it sends a new ping once a second repeatedly, reporting on each pong received. The output format varies a bit depending on operating system, but here’s what it looks like on a modernish Linux system:

$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=19.8 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=14.3 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=55 time=21.0 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=55 time=17.7 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=55 time=14.3 ms
64 bytes from 1.1.1.1: icmp_seq=6 ttl=55 time=15.5 ms
^C
--- 1.1.1.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5009ms
rtt min/avg/max/mdev = 14.292/17.099/21.016/2.621 ms

Much of what ping prints is not terribly useful for my immediate needs. The repeated info about target IP address and packet sizes doesn’t tell me much that I don’t already know, and the summary info is only printed when the pings stop; they’re not updated on the fly.

What I can easily use are the per-pong result lines, the ones that look like this:

64 bytes from 1.1.1.1: icmp_seq=4 ttl=55 time=17.7 ms

The time measurements will let me visualize latency and jitter problems, and the sequence numbers (icmp_seq) will let me track reordered or lost packets and periods of lost connectivity. Perfect; now I just need a UI.

Sketching Out the Basics

I normally keep quite a few terminal windows open on my desktop, so keeping one more open to continuously chart the pingresults seemed like a good place to start. Besides, Raku has long had a simple module for producing custom full-terminal displays, Terminal::Print, which treats the terminal window as a grid of single-character cells, each with optional color and style info.

Sketching out the top level of the program was thus pretty easy:

#!/usr/bin/env raku

use Terminal::Print <T>;

#| Slowly draw a chart of ping times
sub MAIN($target = '1.1.1.1', #= Hostname/IP address to ping
        ) {
    # Initialize Terminal::Print and show a blank screen
    T.initialize-screen;

    # Draw a ping chart on the current screen grid
    my $grid = T.current-grid;
    ping-chart(:$grid, :$target);

    # Shut down, restore the original screen, and exit
    T.shutdown-screen;
}

#| Set up a `ping` child process and convert the ping times
#| to a Terminal::Print::Grid chart asynchronously
sub ping-chart(Terminal::Print::Grid:D :$grid!,
               Str:D :$target!,
              ){
    # To be written ...
}

The MAIN sub sets our basic command line arguments and options, and the Terminal::Print module provides the T alias for terminal control. The simple logic in MAIN initializes full screen control, hands off drawing to the ping-chart sub, and shuts down and restores the original screen on exit.

As a side note, it’s not actually necessary to pass the grid explicitly to the drawing routine — ping-chart could just assume it should use the current screen grid always — but it’s a good habit to get into, as more advanced UIs will likely have multiple different visual grids and drawing routines will then need to know which one to draw on.

Saving this as ping-chart, here’s what I have so far:

$ ./ping-chart -?
Usage:
  ./ping-chart [<target>] -- Slowly draw a chart of ping times

    [<target>]    Hostname/IP address to ping [default: '1.1.1.1']

$ ./ping-chart
# Nothing appears to happen yet, but the program exits cleanly

Nothing appears to happen yet when the program is run without options, except maybe showing a quick flash of blank terminal. On my system the program runs fast enough that the terminal emulator just elides the flash completely.

Asynchronous Reactions

Next up, I filled in the ping-chart sub with some simple event reaction code:

# Prepare a `ping` child process that the reactor
# will listen to
my $ping = Proc::Async.new('ping', $target);

# Run main event reactor until interrupt signal
# or `ping` exits
react {
    # Parse `ping` results
    whenever $ping.stdout.lines {
        if $_ ~~ /'seq=' (\d+) .*? 'time=' (\d+ [\.\d+]?)/ {
            my ($id, $time) = +$0, +$1;
            update-chart(:$grid, :$id, :$time);
        }
    }

    # Quit on SIGINT (^C)
    whenever signal(SIGINT) { done }

    # Quit on child process exit
    whenever $ping.start    { done }
}

Starting an asynchronous child process is quite easy with Proc::Async, which takes the name of the program to run and its arguments, and produces a process object that is waiting to be started.

In order to process the ping output, I use the standard Raku react block to handle three different types of events:

  • Whenever a new line arrives from ping‘s standard output, try to parse it and display the result in the chart.
  • Whenever the user sends an interrupt signal (SIGINT, usually the result of pressing ^C), stop the reactor using done.
  • Whenever the started child process exits on its own (i.e. $ping.start returns to the caller), likewise stop the reactor using done.

Of course stopping the reactor will cause execution to leave the ping-chart sub, and the last line of MAIN will then shut down and restore the normal terminal screen using T.shutdown-screen.

The parsing reaction code looks a bit hairy, but is conceptually simple:

  1. Use a regular expression match on the current output line to try to capture the sequence number and recorded ping/pong time using capturing parentheses; if this fails, ignore the line and wait for another.
  2. Convert the resulting Match objects to regular numbers (with prefix + ).
  3. Call update-chart to actually do the charting work using those numbers.

Here’s a minimal implementation of update-chart:

#| Update the chart with a new ping result
sub update-chart(Terminal::Print::Grid:D :$grid!,
                 UInt:D :$id!,
                 Real:D :$time!) {
    my $x = $id % $grid.w;
    my $y = $grid.h - 1 - floor($time / 4);

    $grid.print-cell($x, $y, 'o') if $y >= 0;
}

The simple steps are as follows:

  1. Select an X coordinate based on the sequence ID, wrapping around the grid width using % $grid.w.
  2. Select a Y coordinate based on the recorded ping/pong time, accounting for terminal Y coordinates increasing from top to bottom instead of bottom to top as you might expect.
  3. Draw a circle on the chart at that (X, Y) location.

Here’s what this minimal charting produces when the program is run without arguments:

o       o   o                   o
          o  o              o    oo    o
 ooooo o o o  oooooooooooooo ooo   oooo oooo
      o

Nothing all that fancy, but right off it’s obvious there’s some base latency plus some occasional timing jitter on top of that. Unfortunately, here’s what the screen looks like after the program is allowed to run for a few minutes:

                      o
                                          o
                                                                o
                            o                          o

                     o    o                          o
       o   o                                       o
                           o                o               o                  o
 o  o                  o     o       o              o   oo        o  o o  o  o
oooo  oooo oooo  o oooo oooo  o o  ooo o o ooo oo ooooo oo  oo  o  oo  ooo  oooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
ooo   oo o             o            o  o                o   o o o o

Since old cells are never cleared when the X coordinate wraps around, the most commonly detected ping times quickly fill up a solid band, though even this is somewhat useful as the scale of the outliers starts to be more obvious. Still, left long enough this would just fill a fair portion of the terminal solidly with circles with no indication of what the current connection status is — in fact the connection could completely go down at any point, and there’d be no obvious change!

Keeping the Screen Clean

There’s so little code in the minimalist update-chart above that it’s fairly easy to just rewrite it from scratch:

#| Update the chart with a new ping result
sub update-chart(Terminal::Print::Grid:D :$grid!,
                 UInt:D :$id!,
                 Real:D :$time!) {
    # Calculate chart edges (from grid size) and X coord
    # (from wrapped ID)
    my $bottom = $grid.h - 1;
    my $right  = $grid.w - 1;
    my $x      = $id % $grid.w;

    # Each time we start a new column, clear it and move
    # the "current column" marker; also make sure to
    # handle longer connection failures that might move
    # forward several columns at a time.
    state $prev-x  = -1;
    while $prev-x != $x {
        # Remove the old "current column" marker if it
        # hasn't been overdrawn
        $grid.print-cell($prev-x, $bottom, ' ')
          if $prev-x >= 0
          && $grid.grid[$bottom][$prev-x] eq '^';

        # Move prev-x forward, chasing current x; wrap
        # at right edge
        $prev-x = 0 if ++$prev-x > $right;

        # Clear the new column to blank
        $grid.print-cell($prev-x, $_, ' ') for ^($grid.h);

        # Move the "current column" marker to the newly
        # cleared column
        $grid.print-cell($prev-x, $bottom, '^');
    }

    # Calculate Y coord (from scaled ping time)
    my $y = $bottom - floor($time / 4);

    # Draw circle at (X, Y)
    $grid.print-cell($x, $y, 'o') if $y >= 0;
}

This does much better when left to run for several minutes:

                                                            o
                      o                                o  o     o
  o o      o o       o o o        o   oo                o  o o o  o o  o       o
 o o oooooo o ooooooo   o o oooooo ooo  ooooooooooooooo  o    o  o o oo ooooooo
o                          o

         ^

Rather than continuously building up into a solid bar, the chart now shows only the new measurements from the last horizontal pass (rather like a heart monitor does). Furthermore the ^ marker shows which measurement is changing and helps to show how large the “latency floor” is, since the marker is always printed on the bottom line of the screen grid.

More To Do

There’s still a lot more to be desired from the current output:

  • There are no axis ticks to give a sense of scale and help the eye determine how bad the ping latency has actually been.
  • It can be hard to see if there is a gap in the marks, and there’s no indication of other types of errors (such as reordered packets or ping times so long they are off the chart).
  • The chart only shows the last screen-width seconds (80 for a default terminal); it would be nice to show additional history without having to open an extremely wide terminal window to do so.
  • The vertical (ping time) detail is fairly poor as well; moving from pure ASCII glyphs to full Unicode provides a few ways around that problem.

I’ll take a look at each of these in the following parts. Until then, may your packets flow freely this holiday season!

Appendix: (Potentially) Frequently Asked Questions

Why use 1.1.1.1 as the default ping target?

That’s the primary public DNS server address for Cloudflare, a very large CDN (Content Delivery Network). It’s fairly responsive in most parts of the world, and if it’s down a fair portion of the Internet at large will be quite unhappy. There are quite a few other public DNS servers with similar properties, such as 8.8.8.8 for Google DNS; a community-curated list of commonly used public DNS servers can be found at Duck Duck Go using this query:

https://duckduckgo.com/?q=public+dns+servers&t=lm&ia=answer&iax=answer

Note: Some of those servers are highly likely to be untrustworthy or actively privacy invading. Do some research and caveat hacker.

State variables: Aren’t they concurrency bugs waiting to happen?

update-chart is only ever called from a whenever block within the main event reactor. Raku guarantees that only one thread at a time can ever be running a whenever block inside a particular react or supply. So update-chart can freely use state variables all it wants, and there can never be a problem caused by concurrent access to them.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.