Day 3 – Helping the Elves Sort Their Mail

Lately, I’ve been a bit obsessed with Dave Thomas’s CodeKata series, and especially solving these problems in Raku. In this post, I want to talk about different ways of writing Raku and how to measure performance. We’ll focus on part 2 of Kata 11: Sorting It Out.

Approach #1: Let’s Use a Regex!

Raku, being a descendant of Perl, has an awesome regex engine. So, my first solution was straightforward enough:

sub sort-characters-regex($msg) {
    $msg
      .lc
      .subst(/:ignorecase <-[a..z]>/)
      .comb
      .sort
      .join
}

We convert the message to lowercase (.lc), case-insensitively remove characters that aren’t letters (.subst(/:ignorecase <-[ a..z]>/)), convert the message to a Seq of characters (.comb), sort them (.sort) and join them back to a string (.join). But just how fast (or slow!) is that? Let’s find out:

sub benchmark(&fn) {
    constant MESSAGE = 'When not studying nuclear physics, Bambi likes to play beach volleyball.';
    constant RUNS = 100_000;

    fn(MESSAGE) for ^RUNS;

    say "{RUNS} runs of &fn.name() took " ~ (now - ENTER now) ~ 's.';
}

benchmark &sort-characters-regex; # OUTPUT: «100000 runs of sort-characters-regex took 6.761854624s.␤»

6.76 seconds. Not bad, if performance isn’t a concern.

Approach #2: Imperative Programming

The original post says:

Are there any ways to perform this sort cheaply, and without using built-in libraries?

This got me thinking: why, that sounds like a Hash! Let’s try it.

sub sort-characters-imperatively($msg) {
    my %chars;
    constant LETTERS = ('a'..'z').join;
    for $msg.comb -> $ch {
        my $l = lc $ch;
        if LETTERS.contains($l) {
            %chars{$l}++;
        }
    }
    my $s = '';
    for LETTERS {
        $s ~= $_ x (%chars{$_} // 0);
    }
    return $s;
}

We iterate over each character of the message, lowercase it, check if it’s a letter and increment that letter’s count. Then, we initialize an empty string $s, and for each letter, we create a string containing said letter as many times as needed with the x operator, and append it to $s. We use // 0, which uses the definedness operator //, to provide a default value in case the letter has no associated count, i.e. is not present in the original message.

It’s quite a bit more code, but it’s also faster:

100000 runs of sort-characters-imperatively took 3.77489963s.

Approach #3: Functional Programming (or the Middle Road)

Is there another way to write this function? It’d be a strange question to ask if the answer were no, so… yes. We’ll use the Bag class. If you’ve ever used Python, it’s like a collections.Counter. If not, according to the docs, a Bag is

a collection of distinct elements in no particular order that each have an integer weight assigned to them signifying how many copies of that element are considered “in the bag.”

For example: "wheelbarrow".comb.Bag returns Bag(a b e(2) h l o r(2) w(2)). This means that “wheelbarrow” has one A, one B, two E’s, and so on. That sounds exactly like what we need:

sub sort-characters-functionally($msg) {
    my %frequency := $msg.lc.comb.Bag;
    return ('a'..'z').map({ $_ x %frequency{$_} }).join;
}

This approach is short and sweet, and somewhere in the middle of the other two in terms of performance:

100000 runs of sort-characters-functionally took 5.038593895s.

Note that Raku also enables us to effortlessly parallelize the code, just by adding .hyper:

sub sort-characters-functionally-hypered($msg) {
    my %frequency := $msg.lc.comb.Bag;
    return ('a'..'z').hyper.map({ $_ x %frequency{$_} }).join;
}

Sadly, that only hurts performance in this case:

100000 runs of sort-characters-functionally-hypered took 49.788108188s.

Ouch.

Still, hyper (and its cousin race) often make code run faster, and they are great tools to have in your toolbox.

Conclusion

Now, which approach should we use in production? Any of the three approaches seem fine. It just depends on what your goal is. Is it performance? Or readability? Or something in between? I love that Raku doesn’t force you to write code in any particular way. You can write it in a way that makes sense to you.

I hope you enjoyed this post, and I hope you learned something new. Merry Christmas!

Acknowledgments

This blog post could not have been written without the thoughtful feedback of Elizabeth Mattijsen. Thank you for all you’ve done for me and the Raku community at large.

5 thoughts on “Day 3 – Helping the Elves Sort Their Mail

  1. Howdy! New to Raku and trying approach #1. Then ENTER now expression in the say interpolated string isn’t returning the appropriate value… I get a runtime of like 0.00000838s. Interestingly, ENTER now outside the interpolated string works correctly. Is this because when the code is compiled, the ENTER now clause is evaluated then?

    I’m using rakudo 2023.11 and MoarVM 2023.11.

    Like

    1. Hi Anonymous, that’s an excellent question. I think this is happening because the curlies create a new block, so what you get is how long it took to subtract now from ENTER now. (which is clearly very fast), but I’m not sure. I’m hoping Elizabeth will chime in to tell me if I’m right or wrong.

      Like

      1. Indeed. I messed up while preparing the post for publication. The “now – ENTER now” should NOT be between curlies in the double quoted string, as the ENTER would then be applied to that scope, instead of the outer “sub” scope.

        Sorry for this mixup. The post is corrected now.

        Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.