Day 10 – Java Annotations in Raku or my @annotation is role;

Today, a little about the fact that the new is better absorbed through the already known. It so happened that I write for $dayjob in Java, so I will come from this side. Java 1.5 introduces an interesting syntactic form – annotations. It looks something like this:

/**
 * @deprecated use #getId() method instead
 */
@Override
public String getName() {
  return "stub";
}

The example shows an annotation @Deprecated that causes the runtime to print a warning to the console every time the getName method is used. In addition, explanatory information has been added to the Javadoc.

In general, annotations in Java are a mechanism for adding some metadata to classes, objects, types, etc., which can be used later at the stage of compilation, execution, or static analysis of the code. With the help of them, for example, it is possible to implement a code decoupling strategy – so that some program components work together with others, without having a rigid connection. This strategy builds on the idea of Inversion of Control and is the core of the Spring library.

But that’s enough Java. What is similar to the annotation engine in Raku? Raku has Traits, a syntax that can be used to mark classes and objects. These labels are processed during compilation of the program. Depending on the wishes of the programmer, the effect of such processing can have an impact on the course of program execution.

For example, consider a similar annotation to the @Deprecated construct from the Raku spec:

sub get-name(--> Str) is DEPRECATED('get-id() method') {
  'stub'
}

is DEPRECATED is a trait. The argument to this trait can provide an alternative to the deprecated code. After the program finishes, during the execution of which the get-name function was called, a message will be displayed indicating where and how many times the obsolete code was executed:

Saw 1 occurrence of deprecated code.
======================================================================
Sub get-name (from GLOBAL) seen at:
  ~ / advent.raku, line 13
Please use get-id() method instead.
----------------------------------------------------------------------
Please contact the author to have these occurrences of deprecated code adapted, so that this message will disappear!

Obsolete

is DEPRECATED is a trait from the standard library. To understand how it works, let’s try to write our analogue under the name obsolete. First, let’s define the storage of the collected information – a class that stores and updates the number of function calls and is able to display a report:

class ObsoleteTraitData {
  has $.routine-name is required;
  has $.user-hint;
  has $!execution-amount = 0;
  method executed() { $!execution-amount++ }
  method report() {
    return unless $!execution-amount;
    note "Obsolete routine $!routine-name is executed $!execution-amount times.";
    note $_ with $!user-hint;
  }
}

Now we declare a test trait – this is an ordinary multifunction with a name trait_mod:<is> and two arguments: the first is what the trait will be applied to (in our case, this is a Routine), the second is the name:

say 'run-time';
multi trait_mod:<is>(Routine $r, :$obsolete!) {
  say 'compile-time'
}
sub get-name(--> Str) is obsolete {
  'stub'
}
say get-name;
# Output: compil-time
#         run-time
#         stub

The most important thing to understand about traits is that their functions are executed at compile time, not at program execution. This can be clearly seen from the output of the code above. Let’s remember what we want to achieve – a report on the execution of obsolete code before the program terminates. We can obtain this information only during the execution. To affect compile-time execution, the trait must modify the function in some way. In our case, you can add via the function phaser ENTER. This is a special block that is executed before the first statement of the function is executed. That is, we make the function get-name looks something like this:

sub get-name(--> Str) {
  ENTER { $obsolete-trait-data.executed }
  'stub'
}

We cannot touch the code of the function itself, but we can do the necessary manipulations during compilation. We take the function name, a possible hint for the user, create a new type object ObsoleteTraitData, put it in the local associative variable %obsolete-trait-data and add the necessary phaser:

my ObsoleteTraitData %obsolete-trait-data;

multi trait_mod:<is>(Routine $r, :$obsolete!) {
  my $routine-name = $r.name;
  my $user-hint = $obsolete ~~ Str ?? $obsolete !! Any;
  %obsolete-trait-data{$routine-name} =
    ObsoleteTraitData.new(:$routine-name, :$user-hint);
  $r.add_phaser('ENTER', -> {
    %obsolete-trait-data{$routine-name}.executed;
  });
}

Now, when the function get-name is executed, the ObsoleteTraitData object will update its state. Thus, we influenced the program execution flow during compilation. It remains only to display the report. To do this, we will add another phaser END to the main code. Its block is executed just before the end of the program. Thus, we get the following picture:

class ObsoleteTraitData { #`(described above) }

my ObsoleteTraitData %obsolete-trait-data;

END { .report for %obsolete-trait-data.values }

multi trait_mod:<is>(Routine $r, :$obsolete!) { #`(described above) }

sub get-name(--> Str) is obsolete('Please use get-id() instead.') {
  'stub'
}
sub another-obsolete() is obsolete {}

get-name();
another-obsolete();
get-name();

# Output:
# Obsolete routine get-name is executed 2 times.
# Please use get-id() instead.
# Obsolete routine another-obsolete is executed 1 times.

Override

Another commonly used annotation in Java is @Override on a class method. The case where it does not override a super-class method is considered a compilation error. It will not be difficult to make a similar trait – we will not have to go beyond the compilation stage. We declare a trait with a name override that applies only to methods:

multi trait_mod:<is>(Method $m, :$override!) {

We check that the method is a member of the class, otherwise we exit:

  return unless $m.package.HOW ~~ Metamodel::ClassHOW;

We check that the class of the owner of the method has parents. To do this, we will use the meta-method ^mro, which will return a list of all parent classes, including itself, Any and Mu (we will filter them from consideration):

  my $class = $m.package;
  my $method-point = $class.^name ~ '::' ~ $m.name;
  my @parents = $class.^mro[1 ..^ *-2];
  die "is override trait cannot be used without parent class $method-point." unless @parents;

We go through all the parents and their methods in search of one that matches in name and signature. Comparing method signatures is not a very trivial task, and here we will hide its implementation behind a function check-signature-eq:

  for @parents -> $parent {
    for $parent.^methods -> $parent-method {
      return if $parent-method.name eq $m.name &&
        check-signature-eq($parent-method.signature, $m.signature)
    }
  }

If the parents did not find the required method, they will return an error:

  die "$method-point does not override any parent methods.";
}

As a result, we get the following:

multi trait_mod:<is>(Method $m, :$override!) { #`(described above) }

class A {
  method from-a(:$r) {}
}

class B is A {
  method from-a($r) is override { # missed a colon
    say 'from-b'
  }
}

# Output: B::from-a does not override any parent methods.
# Exit code: 1

Suppress

We have already managed to implement the logic of the Java annotations @Deprecated and @Override. Let’s try to implement the logic of @SuppressWarnings. This annotation applies to the function and suppresses its warning messages. Also, you can specify which warnings will be suppressed.

In Raku, warnings can be displayed using a function warn. It throws a special exception, which is printed to the error stream, and the execution process resumes where it was. You can catch such an exception using a special phaser CONTROL. That is, as in the case with @Deprecated, we need to modify the function by adding the desired phaser. Let’s try something new and use the function wrapper instead of add_phaser. How does it work? We are replacing one function with another that can call the original (by the routine callsame) at its discretion . Inside this function, we will insert a phaser CONTROL, which will mimic the standard behavior, but not for suppressed warnings:

multi trait_mod:<is>(Routine $b, :$suppress-warnings) {
  my $regex = $suppress-warnings ~~ Str
    ?? / <$suppress-warnings> /
    !! Any;
  $b.wrap(sub with-control(|c) {
    callsame;
    CONTROL {
      when CX::Warn {
        .note if $regex.defined && $_.message !~~ $regex;
        .resume
      }
    }
  });
}

sub work-in-progress() is suppress-warnings('todo') {
  warn 'important warn';
  warn 'todo warn';
}

work-in-progress()
# Output:
# important warn
#   in sub work-in-progress at ~/trait-supress.raku line 15

Serialize

All that remains is to discuss user-defined annotations. As I said above, Java annotations are a way to attach some meta information to a class or object. Thereafter, at compile time, or more often at runtime, the annotated objects are checked to see if they have the information they need. In Raku, roles are great for this. Consider the problem of adding the simplest serialization system to a class. Let’s write a class and mark it up our future trait:

class Person is serialize-name('Passport') {
  has $.first;
  has $.second is serialize-name('Second name');
  has $.third is serialize-name('Honorific');
}

You can see that trait serialize-name applies to both the class itself and its attributes.

The trait for the attribute looks like this:

role SerializableAttribute {
  has $.serialize-name;
}

multi trait_mod:<is>(Attribute $a, :$serialize-name!) {
  $a does SerializableAttribute(:$serialize-name);
}

Above, the trait adds a new SerializableAttribute role to the attribute. This role itself injects a new attribute into the attribute 🙂 The value of the new trait attribute is passed through its argument.

The trait for the class looks like this:

role SerializableClass[$name] {
  method serialize() {
    say $name, '| ', self.^name;
    say .serialize-name, '<-', .get_value(self)
      for self.^attributes(:local) .grep(*.^can('serialize-name'));
  }
}

multi trait_mod:<is>(Mu:U $c, :$serialize-name!) {
  return unless $c.HOW ~~ Metamodel::ClassHOW;
  $c.^add_role(SerializableClass[$serialize-name]);
}

Above, you can see that trait checks that it applies exactly to the class and adds a special role SerializableClass. This role adds a new method serialize to the class that implements all the serialization logic. In particular, it filters the list of all class attributes based on the presence of a method serialize-name.

If we run all this, we get:

Person.new(:first<John>, :second<Hancock>, :third<Mr>).serialize();
# Output:
# Passport | Person
# Second name <- Hancock
# Honorific <- Mr

Conclusion

As we can see, traits are a pretty powerful tool, but like everything in the Raku world, it can be used in very different ways. For example, in Java, when declaring their annotations, the programmer must indicate to what stage its action extends (only at the code level, until the end of compilation, or until the end of the application). You can also specify whether the annotation will be inherited by child classes, and whether it can be specified multiple times. On the other hand, traits in Raku give the programmer complete freedom of action. You now have the knowledge to write your own Inversion of Control/Dependency Injection system like Java Spring Core using Raku traits.

Day 9 – Raku code coverage

Although I love using Raku, the fact that it is still a relatively young language means that there is a fair amount that is lacking when it comes to tooling, etc. Until recently, this included a way to calculate code coverage: how much of the code in a library is exercised (=covered) by that library’s test suite.

Now, truth be told, this feature has been available for some time in the Comma IDE. But this (together with other arguably essential developer tools like profiling, etc) is only available in the “Complete” edition, which requires a paid subscription.

Still, I knew that the Raku compiler kept track of covered lines, so I always felt like this should be doable. It only needed someone to actually do it… and it looks like someone actually did.

So, consider my surprise when, while recently browsing raku.land, I came across App::RaCoCo, which claims to be ‘a Raku Code Coverage tool’. Sweet!

Let’s see how it works.

Running locally

The library ships with a racoco executable, which is what we’ll use to calculate the coverage. The first couple of times I ran it I got some scary output because it could not find the library to test, but after reading the documentation, and trying a couple of things out, I managed to find a right set of options for me.

Let’s see it in action on my very own HTTP::Tiny:

$ racoco --exec='prove6 -l' --html
t/agent.t ......... ok
t/errors.t ........ ok
t/mirror.t ........ ok
t/online-async.t .. skipped
t/online-basic.t .. skipped
t/requests.t ...... ok
t/responses.t ..... ok
t/url-parsing.t ... ok
All tests successful.
Files=8, Tests=52,  9 wallclock secs
Result: PASS
Visualisation: file:///home/user/HTTP-Tiny/.racoco/report.html
Coverage: 81%

Success! We can run our test suite, with the development version, and we get a nice little summary at the bottom. Thanks to the --html option we even generated an HTML report we can examine in the browser, with line-by-line details on what was covered.

The tool is still young, and there are still quirks that should be ironed out. I’d expect the friction with the --exec flag to be one of those. But until then, we have a working tool we can use. Huzzah!

So we can run the tool locally, which is great. But can we run it on code that is hosted remotely? And how do we publish those results?

With a lot of my distributions, what I’ll do is send coverage output to Coveralls, which keeps track of it and renders it publicly, which is great.

However, racoco does not ship with a Coveralls exporter, and currently has no way to plug in custom reporters (like, say, the cover tool used in Perl). This feature is in development, but until then, we’ll need an alternative.

Running on GitLab

Since most of my Raku distributions are hosted on GitLab, that’s what I’ll be demonstrating, but a lot of these steps are likely the same or similar in other popular CI platforms.

The CI configuration I’ll be using will look something like this:

# In your .gitlab-ci.yml
coverage:
  image: rakuland/raku:latest
  before_script:
    - zef install --/test --deps-only --test-depends .
    - zef install --/test App::RaCoCo
  script:
    - racoco --exec='prove6 -Ilib' --html
    - mv .racoco public
    - find public -type f -not -name "*.html" -delete
  artifacts:
    paths:
      - public
    public: true

This defines a “coverage” job which will run in an environment where we install the dependencies of the library we are testing, as well as the App::RaCoCo distribution itself. We then use racoco to generate the report, and we make sure all the HTML files from the report are in the public directory, which we can then expose as a public artifact.

This means we can then view these in the browser via a link like this one.

But we can go one step further. Even though we cannot (yet) easily talk to external tools like Coveralls, we can still make use of the Gitlab features to put this link in a badge that nicely displays our coverage percentage.

For that, we’ll have to set a coverage parsing regex, which Gitlab will use to parse the coverage percentage from the CI job output. In this case, to work with the racoco output, we can use Coverage: \d+.\d+%.

The value that is parsed will then be available as a badge that can be set in the “General” project settings.

The fields in that section take placeholders, which means that these values should work for whatever project we are configuring. We can use this one for the path to the coverage badge:

https://gitlab.com/%{project_path}/badges/%{default_branch}/coverage.svg

And this one if we want to link to the published artifacts of the latest coverage job (do note that in this case we are referring to the job by name, so if you’ve used a different name you’ll have to update it):

https://gitlab.com/%{project_path}/-/jobs/artifacts/%{default_branch}/file/public/report.html?job=coverage

If all went well, the badge will display in the main page of your project as shown in the image at the top of this post. This will happen automatically (=you don’t have to manually add them to the readme, for example), and any badges you add will link to wherever you pointed them to.

Room to grow

As noted above, racoco is still young and there are are still some rough edges. One in particular is that running the tool multiple times on the same test suite will sometimes generate slightly different results, and that some lines might either not be picked up as coverable or covered, even though they are. Some of this is due to this being a new tool, and some of it is due to the way Rakudo reports this data in the first place. In either case, these should be issues we can fix.

Despite the rough edges, the tool has already proved useful to me, and it’s become a part of my regular setup.

The future is bright, and there’s room to grow.

Day 8 – Practice… on Advent of Code

“Hrmpf!” mutter mutter mutter “Bah!”

The head elf Fooby Nimblecalmy was trying to to read an interesting article on Ramsey Theory, but was having a hard time because the latest addition in Santa’s IT Operations Buzz Bargoosey was steaming like a kettle.

Anyway, Fooby was determined to go through the article, so decided to deliberately ignore Buzz.

“Grump! Moan… moan… moan…”

It wasn’t going to end any soon and Ramsey Theory definitely requires attention, so Fooby decided to bite the bullet and ask:

“Well… what’s up Buzz?”

“Uh, oh… sorry Sir Nimblecalmy, nothing Sir…”

“I told you not to call me Sir, and you’re not going to end any soon… so again, what’s up Buzz?”

“Well Si…AHEM Fooby, the fact is that I’m bored to death! There’s nothing to do here!”

Fooby looked at Buzz from over the glasses and noticed a shockingly resemblance to… a younger Fooby, too many years ago. Only that, at the time, there were a lot of automation tasks and there was this new shiny programming language, Perl

Buzz had a point though. They basically implemented all the implementable so far, so these days it was all maintenance every now and then. And not all the other elves were so much into mathematics.

Even though…

“You know… we will soon have to upgrade a few programs here, to take advantage of the more recent multi-core processors. I heard Raku is perfect for going parallel without too much hassle!”

“Oh, Raku yes… it would be great!”

Buzz’s face was bright and dreamy again, so Fooby was ready to delve again into Ramsey Theory. A tad too fast, though, because Buzz sighted in a loud and clear way.

“What’s up again, son?”

“Well… I know a little about Raku, but I need to exercise a lot and I don’t know what to do about it!”

Now it was Fooby’s time to sigh. Ramsey Theory was quickly fading at the horizon… when light came, suddenly!

“Why don’t you try to solve a couple of puzzles a day, say up to Christmas day? You might start with something simple, and increase complexity as days go…”

“This would be brilliant Sir! Yes!”

“So… first of all don’t call me Sir, then head over to Advent of Code and start from the beginning! Each day you will be facing a puzzle, and another one will be available after you solve the first one.”

“OK yes… but how will this help me to learn Raku?!?”

“Well, just by solving the problems you will need to use the language and understand how things can be done in at least one way. Then… after you have solved the daily puzzles, or if you’re stuck, you can head to the Solution Megathreads in Reddit and look for other Raku solutions… there’s a lot of clever people there, although still a tad too few.”

Buzz was thrilled by the idea, but still unsure about it. So Fooby decided to show an example, just to get Buzz started.

“Let’s take day 1 as an example. In part 1, we have a list of numbers in a file, and we have to find out how many times a number is greater than the one immediately before”

“Oh, I know I know! This is how to do it” and started writing on the terminal:

my @numbers = $filename.IO.lines;
my $count = 0;
for 1 .. @numbers.end -> $i {
    ++$count if $numbers[$i - 1] < $numbers[$i];
}
put $count;

Fooby noticed that Buzz already had some confidence with Raku, but also that there was definitely space for improvement.

“Well, that’s a good start for sure! Now you can take a look at seaker’s solution for fun and learning” and showed Buzz this:

#!/bin/env raku

sub MAIN(Str:D $f where *.IO.e = 'input.txt') {
    my @n = $f.IO.words;
    put 'part 1: ', [+] @n Z< @n[1..*];
    put 'part 2: ', [+] @n Z< @n[3..*];
}

“You see? seaker is being more precise by using .words instead of .lines, which improves readability.”

“OK but… what’s with that way of calculating the result for part 1?!?”

“Oh, that’s part of the Christmas Magic! I mean, Raku‘s Magic. First, let’s consider the zip metaoperator Z:”

@n Z< @n[1..*]

“It takes one element from the left and one from the right, and applies the comparison operator < to the pair”.

“OK but… what’s with @n[1..*] on the right?” asked Buzz.

“Well, that’s a lazy indexing of the array @n. Raku takes elements as long as they are needed, and nothing more. So here the @n on the left will basically determine how many elements to take. The right hand side might go beyond the end of the array, but it’s not a problem in this case because they will not yield a True value in the comparison”.

“OK, so we’re left with an array of booleans, right?”

“Right. This is where the [+] comes into play. It’s a sum operator +, wrapped into a reduction metaoperator“.

“Oh I know, I know! The Red Auction is when we offer Christmas sweeties to get Santa’s hat on December 26th, right?”

Fooby took off the glasses and massaged a bit the bridge over the nose. Ramsey Theory had never been farther…

“No… not the da…rling Red Auction, but reduction. It’s an operation on a list of values, that takes the first two items and applies an operator to get a result. Then it takes the result and the next item, applies the operation again, and so on. Reduction, because it reduces a list down to a single item. In this case, the operation is the sum and the result is the sum of all values.”

“OK, I get it. But wait! We’re summing booleans here… it that allowed?”

“Good catch!” replied Fooby. “Actually yes, because when a boolean is used as a number, it takes a value that is either 0 for False and 1 for True, which is exactly what we need here, because we’re just counting how many True values we have”.

“Right, I see… in the same spirit of Perl, I daresay. What’s with that signature? That is different from Perl!”

sub MAIN(Str:D $f where *.IO.e = 'input.txt') { ...

“Yes, it is indeed. It’s all for a single input parameter $f, actually. The Str part tells us that we’re expecting a string…”

“Oh!” interrupted Buzz. “Does this mean that we have to assign a type to each variable and parameter? Why don’t we do anything to @n? Which type…”. The flood gates were open!

“Hold on! Hold on a second” interrupted Fooby. “Raku has something called gradual typing, in that you assign a type to a variable only if you think it’s useful for you. In this case, the author thought it was useful and set it.”

“Uh, well, sorry Sir… please go on, what’s with the smiley?”

“Please… don’t call me Sir. That’s not a smiley, it’s a type constraint that asks Raku to check that the input is defined.”

“Anyway” intervened Buzz “I’m always happy when things are defined, so it’s still a smiley for me! Now that where…”

“That is an additional constraint on the input” answered Fooby. “The star is a placeholder for $f itself, and the expression asks Raku to check that the input string, when considered a file name, should refer to a file that actually exists in the filesystem.”

“Brilliant!” observed Buzz. “Then I think that the equal to input.txt part sets a default value, right?”

“Precisely!” agreed Fooby.

“Uhm I think I got it… Fine! I’ll look into the other part of the solution, and try some puzzles myself. Thanks a lot Sir!”

Fooby was about to complain about being called Sir, then decided it was not the case and, at last, delved back to that interesting article on Ramsey Theory…

Day 7 – Neural Nets in Raku (Part 1)

Thinky the Elf was sitting in his office, it had been a closet but he’d been given it as his office after the great baked beans incident. It wasn’t his fault. He was right that feeding the reindeer beans would give them a jet boost but Santa had not been all that happy about it. And his tendency to stare of into space while suddenly having a thought wasn’t great while working on the shop floor meant it was safer to put him out of the way to do some thinking.

Recently he’d been thinking about how to sort children into naughty or nice. This was Santa’s big job all year and Thinky thought that there must be a way to simplify it, he’d spent some time watching videos on YouTube and there was one that gave a brilliant description of Neural Networks (jump to 20 minutes for that bit but it’s an interesting video). As Thinky watched this he couldn’t help thinking about Raku and how the connections between nodes felt like Supplies.

With this he dived in and played about to try and build Neural Networks with Raku and Supplies, he tried a few things and got to a system that worked but it had a few drawbacks.

Firstly we start with a Neuron role. The Neuron might be an input, and intermediate grouping one or a final output one but they have some shared functionality.

role Neuron {
has %!input-vals = {};
has Supply $!die;
has Supply $!input;
has Str $.id is required;
has Str $.gene;
has Bool $.scream;
has Promise $.watch;

submethod BUILD( :$!die, :$!id, :$!gene = '', :$!scream = False ) {}

%input-vals stores the inputs this Neuron has received. The $!die Supply will receive a message when the Neuron (and its containing Brain) is to stop whilst the $!input Supply takes in all the input data. Each Neuron has a unique id and also knows the gene used to create the Brain it is found in. The scream Boolean will cause it to emit tracking info via notes.

Then there are a few methods :

    method !process-inputs() {...}

process-inputs is a placeholder for the concrete Neuron classes to handle what to do with incoming data.

    method start() {
        my $alive = True;
        if ( ! $!input.defined ) {
            return;
        }
        return start react {
            whenever $!die -> $ {
                note "{$!gene} : {$!id} : DIE" if $!scream;
                $alive = False;
                done();
            }
            whenever $!input -> ($id, $v) {
                note "{$!gene} : {$!id} : {$id} : {$v} : {$alive}" if $!scream;
                %!input-vals{$id} = $v;
                self!process-inputs();
                done() unless $alive;
            }
        };
    }

start brings a Neuron to life and returns a Promise (or if the Neuron isn’t wired up with inputs nothing). The Neuron watches its two supplies and fires off process-inputs after updating the %!input-vals hash. If it receives a trigger on the $!die supply it shuts itself down and tells any other processes will running to do so too by setting the internal alive Boolean.

    method attach-input(Supply $s) {
if ( ! $!input ) {
$!input = Supply.merge( $s );
} else {
$!input = $!input.merge($s);
}
}
}

attach-input uses the merge method to combine all the inputs passed into a Neuron into one single one that the start method watches.

Two of the Neurons, the Input and Group, can have multiple outputs so we’ll make a Role for them.

role PassThruNeuron does Neuron {
has @.outputs;

method attach-output( Supplier $out ) {
@!outputs.push( $out );
}
}

Then we defined the Input and Group Neurons as PassThrus.

class InputNeuron does PassThruNeuron {
method !process-inputs() {
if ( %!input-vals{$!id}.defined ) {
.emit( ( $!id, %!input-vals{$!id} ) ) for @!outputs;
}
}
}

The Input neuron filters any input data it’s received for its id only and sends this onto its outputs. This allows us to have one shared Input stream that inputs can pull from.

class TanHGroupNeuron does PassThruNeuron {
has Rat $!threshold = 0.1;
has $!previous;

method !process-inputs() {
.emit( ( $!id, tanh( [+] %!input-vals.values ).round($!threshold) ) ) for @!outputs;
}
}

The TanHGroupNeuron (called as such to allow for multiple type of Group Neurons) computes the tanh value of the sum of its inputs and sends these out.

And then we have the Output Neuron, it’s only got one output value so it’s pretty simple.

class OutputNeuron does Neuron {
has Num $!output;
has Rat $!threshold = 0.1;

method !process-inputs() {
$!output = tanh( [+] %!input-vals.values );
}

method output() {
$!output.defined ?? $!output.round($!threshold) !! 0;
}
}

Once again we’re rounding the value output and if there isn’t a value set we return a 0. Note that we have $!threshold values to manage rounding. These are currently only set at the defaults but it’s there for the future.

With the Neuron built we turn to look at the paths between them, these can be defined as a start point (an input or group neuron) and an end point (a group or an output neuron) and a weight which the value being sent is multiplied by.

class PathSpec {
has Str $.input;
has Str $.output;
has Rat() $.weight;
method Str() { "{$.input}:{$.weight}:{$.output}" }
method gist() { "{$.input} ==x{$.weight}==> {$.output}" }
method COERCE( Str:D $str --> PathSpec:D ) {
my ( $input, $weight, $output ) = $str.split(":");
PathSpec.new( :$input, :$output, :$weight );
}
}

The PathSpec class covers all this including the ability to transform them to or from Strings.

Finally we have the Brain (collection of Neurons and Paths).

class Brain {
my $killScheduler;
my $pathScheduler;

has Supplier $.inputStream;
has Supplier $!killStream;
has @.watch-list;
has @.outputs;

A Brain has an $.inputStream with all the input data and the $!killStream that will be connected to the $!die inputs on each Neuron in the Brain. The @.watch-list and @.outputs arrays contains the Promises from each start method and the combined output values from each OutputNeuron. We also defined 2 class level attributes, schedulers that will be shared between all the running brains. This is to stop the default scheduler from being overwhelmed.

    method kill() {
       $!killStream.emit(True).done();
   }

   submethod BUILD( :$!inputStream, :$!killStream, :@!watch-list, :@!outputs ) {}

   submethod DESTROY { self.kill(); }

A few methods to help building an tearing down brains and then we move to the make method. You can either give it a gene string a list of PathSpec strings joined by commas or a list of PathSpec Objects.

    multi method make( Brain:U: Str :$gene!, :$inputStream, Bool :$scream) {
        my @paths = $gene.split(",").map( -> $g { my PathSpec(Str) $p = $g; $p });
        return Brain.make( :@paths, :$inputStream, :$scream );
    }

    multi method make( Brain:U: :@paths! is copy, :$inputStream = Supplier::Preserving.new(), Bool :$scream ) {
        my (@inputs, @outputs, @groups);

        my $gene = @paths.join(",");

        $pathScheduler //= ThreadPoolScheduler.new();
        $killScheduler //= ThreadPoolScheduler.new();
        my $killStream = Supplier.new();
        my $killSupply = $killStream.Supply();
        $killSupply.schedule-on($killScheduler);

Create the kill and path schedulers if they don’t already exist, and an internal killstream that will be assigned to the private variable and the killSupply to pass to the Neurons.

    my @combined;
    repeat {
        my $ps = @paths.shift;
        for (@paths) -> $check is rw {
            if ( $ps.input ~~ $check.input && $ps.output ~~ $check.output ) {
                $check = PathSpec.new(
                    :input($ps.input),
                    :output($ps.output),
                    :weight($ps.weight + $check.weight)
                );
                $ps = Nil;
                last;
            }
        }
        @combined.push($ps) if $ps.defined;
    } while @paths;

    @paths = @combined;

With randomly generated genes we may end up with multiple connections between Neurons, this code combines them into single paths.

        for (@paths) -> $p {
given $p.input {
when m/^i/ { @inputs.push($_) }
when m/^g/ { @groups.push($_) }
}
given $p.output {
when m/^g/ { @groups.push($_) }
when m/^o/ { @outputs.push($_) }
}
}
@inputs .= unique;
@outputs .= unique;
@groups .= unique;

my $inputSupply = $inputStream.Supply();

my %nodes;
my @final-outputs;
for ( @inputs ) -> $id {
%nodes{$id} = InputNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
%nodes{$id}.attach-input($inputSupply);
}
for ( @outputs ) -> $id {
%nodes{$id} = OutputNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
@final-outputs.push( %nodes{$id} );
}
for ( @groups ) -> $id {
%nodes{$id} = TanHGroupNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
}
for ( @paths ) -> $ps {
my $path = Supplier.new();
%nodes{$ps.input}.attach-output($path);
%nodes{$ps.output}.attach-input($path
.Supply
.map( -> ($i,$v) { ($i, $v * $ps.weight) })
.throttle(1, 0.5)
.schedule-on($pathScheduler)
);
}
my @watch-list = %nodes.values.map( *.start() ).grep( *.defined ).list;

return Brain.new( :$inputStream, :@watch-list, outputs => @final-outputs, :$killStream );
}
}

Then we create our Neurons and join them up based on the paths passed in. The paths are created using map for the weighting and some throttling to control feedback loops. Finally we pass these to the pathScheduler to ensure kill messages and other async process can still run safely.

Thinky was really happy with this system, for 1000 brains his 16 core machine handled just fine. With 5000 things got a bit crazy but still ran and generally he didn’t think he’d need that much. All in all he was really happy with them… now he just had to remember why he’d made them. And work out how to train them and mutate them. But that was a job for another day (hopefully this advent calendar but this is very much a work in process).

I (I mean Thinky) hopes to get some of this released as a module soon if people are interested.

Here’s some example usage, creating 1000 brains that share and input stream, passing in some inputs, getting the outputs and doing it again before shutting everything down.

my @ins = qw<i1 i2>;
my @gs = qw<g1 g2 g3>;
my @os = qw<o1 o2>;

my @paths;
my @is;
for (1..1000) {
    my @p;
    for (1..4) {
        @p.push( (
                    (|@ins,|@gs).pick,
                    (-40..40).pick / 10,
                    (|@os,|@gs).pick,
                ).join(":") );
    }
    @paths.push( @p );
}

my $inputStream = Supplier::Preserving.new();

@paths = @paths.map( -> @p {       
        my $gene = @p.join(",");
        note $gene;
        {
            gene => $gene,
            brain => Brain.make( :$inputStream, :$gene ),
        }
    }
);
note "Made brains";

sleep(1);

note "Emit i1 : 0";
$inputStream.emit(('i1',0,));
note "Emit i2 : 1";
$inputStream.emit(('i2',1,));

sleep(0.1);
note "Output?";
for ( @paths ) -> %p {
    for ( %p<brain>.outputs ) -> $o {
        say( "{%p<gene>} : {$o.id} : {$o.output // 0}");
    }
}

sleep(0.1);
note "Emit i1 : 1";
start $inputStream.emit(('i1',1,));
note "Emit i2 : 0";
start $inputStream.emit(('i2',0,));


sleep(0.1);

note "Killing Brains";
.<brain>.kill for @paths;
note "Closing Stream";
.done() for @is;
note "Awaiting the end";
await | @paths.map( *<brain>.watch-list );
note "All done";
note "Output?";
for ( @paths ) -> %p {
    for ( %p<brain>.outputs ) -> $o {
        say( "{%p<gene>} : {$o.id} : {$o.output // 0}");
    }
}

Day 6 – Following the Unix philosophy without getting left-pad

The Unix philosophy famously holds that you should write software that “does one thing, and does it well”. There are other tenets as well, but I’m focusing on the core idea expressed in Programming Design in the UNIX Environment:

Whenever one needs a way to perform a new function, one faces the choice of whether to add a new option or write a new program…. The guiding principle for making the choice should be that each program does one thing.

For instance, if you’re writing a program that produces text in one format, don’t also have it print the text in eight alternative formats. Instead, leave that task for a different specialized program that can process your program’s output. Or, put differently, fight against your program’s inherent tendency to “attempt to expand until it can read mail” (Zawinski’s law).

Of course, you don’t want to follow the Unix philosophy off a cliff, and programmers have been arguing about exactly where to draw the line since well before Rob Pike complained that “cat(1) came back from Berkeley waiving flags” 40 years ago. Nevertheless, the do-one-thing-and-do-it-well approach is well worth aiming for.

In the context of writing libraries, the Unix philosophy encourages the practice of writing micro-packages: small libraries, intentionally limited in scope, that serve exactly one purpose. Some programming language communities have this as an explicit goal; for example, one of the leading Node.js developers explicitly invoked the Unix philosophy in their advice to programmers:

Write modules that do one thing well. Write a new module rather than complicate an old one.

This practice of writing micro packages contrasts sharply with the practice of writing omnibus packages that attempt to provide a single, coherent API that aims to solve any problem a developer might encounter. And micro packages benefit from all the advantages that have made the Unix philosophy such good advice for 50 years. Most notably, micro packages tend to be simple enough (and small enough) that you can personally inspect the code – and, if necessary, debug any issues that come up.

The downside of micro packages

As this post’s title probably gave away, the problem with overusing micro packages is that it can lead to what happened with left-pad. Without rehashing all the details, there was an 11-line JavaScript package (left-pad) that did nothing other than pad each line of a string with a specified amount of whitespace. Yet, somehow, a huge percentage of the JavaScript ecosystem depended on this simple function – either directly or more commonly indirectly. As a result, when the developer removed the package (in a way that couldn’t happen anymore for reasons not relevant here), that same fraction of the JavaScript ecosystem fell over. I’m not sure exactly how many builds failed, but one source estimated that over 2.4 million software builds depended on left-pad every month. So not a small number.

In other words, someone finally pulled out the one domino that the entire Internet depended on:

xkcd 2347, CC BY-NC 2.5. [A tower of blocks is shown. The upper half consists of many tiny blocks balanced on top of one another to form smaller towers, labeled:]  All modern digital infrastructure  [The blocks rest on larger blocks lower down in the image, finally on a single large block. This is balanced on top of a set of blocks on the left, and on the right, a single tiny block placed on its side. This one is labeled:]  A project some random person in Nebraska has been thanklessly maintaining since 2003.  {alt-text:} Someday ImageMagick will finally break for good and we'll have a long period of scrambling as we try to reassemble civilization from the rubble.

And while left-pad may be an extreme example, the direct consequence of JavaScript’s embrace of the Unix philosophy is that JavaScript programs commonly depend on huge numbers of micro packages.

A 2020 study found that the typical JavaScript program depends on 377 packages (here, “typical” means “at the geometric mean”, which reduces the impact of outliers). And a full 10% depend on over 1,400 third-party libraries. Many of these dependencies are admirably tiny: one of the most depended-on packages (used by 86% of JavaScript packages – literally tens of millions of developers) is essentially just one line of code. It’s hard to take “do just one thing” to any greater extreme.

And yet.

And yet, I don’t believe that any developer can reasonably comprehend a system made up of hundreds (thousands?) of independent packages. It’s not just a matter of the total lines of code climbing to incomprehensible levels (though that famously happens and certainly doesn’t help). But even if the total lines of code were manageable, the interaction effects simply aren’t – remember, these packages weren’t designed to form a coherent whole, so they can and will make inconsistent assumptions or create inconsistent effects.

The many different problems that can arise from this abundance of micro packages leads some people to conclude that you should kill your dependencies. Or, as Joel Spolsky put it:

“Find the dependencies — and eliminate them.” When you’re working on a really, really good team with great programmers, everybody else’s code, frankly, is bug-infested garbage, and nobody else knows how to ship on time. When you’re a cordon bleu chef and you need fresh lavender, you grow it yourself instead of buying it in the farmers’ market, because sometimes they don’t have fresh lavender or they have old lavender which they pass off as fresh.

This principle, unfortunately, seems to be directly in conflict with the ideal of “code reuse good — reinventing wheel bad.”

A wild dilemma appears

At this point, I hope the tension is pretty clear: on the one hand, it’s great to keep components small, simple, and composable. On the other hand, it’s terrible to bury yourself in a tangle of different packages, no matter how tiny they are. The Unix philosophy and killing your dependencies pull in opposite directions.

Of course, this is hardly a new insight. It’s a point many people have made over the years; I particularly enjoyed how Rust-evangelist extraordinaire Steve Klabnik put it a couple of years ago:

tweet by Steve Klabnik with the text "developers are like 'The UNIX philosophy, the pinnacle of software dev: Make each program to one thing well.'  'lol your project has tons of tiny dependencies? leftpad lolol'" and an image of the Daily Struggle meme (https://knowyourmeme.com/memes/daily-struggle)

But I want to do more than note the tension: I want to provide a solution (or at least an outline of what I view the solution to be). Before I do so, however, I need to mention a few non-solutions that I reject.

First, I don’t think that we should resolve this dilemma by fully choosing one side or the other. Like Russ Cox, I acknowledge that installing a dependency entails allowing your “program’s execution [to] literally depend[] on code downloaded from [some] stranger on the Internet”; I don’t believe that doing so thousands of times will ever be a recipe for crafting robust software. As much wisdom as there is in the Unix philosophy, it simply won’t do to accept it 100% and embrace the micro-package dystopia.

At the same time, I also cannot fully embrace the “kill your dependencies” extreme. While it would be appealing to live in an ideal world where, like one developer I admire, “I [could] list the entire dependency graph, including transitive dependencies, off of the top of my head”, I don’t believe that’s a tenable solution. For one thing, the code reuse and code sharing that micro packages enable is a huge part of what gives open source and free software developers superpowers: If a project can only be done by a team of dozens, it will almost certainly be built by a for-profit company. But if relying on existing packages lets one or two hackers, working alone, create that software – well, then, there’s an excellent chance that we’ll have a free software version of the program. (Remember, mega-projects like Linux are very much the exception, not the rule – the median number of maintainers for free software projects is 1, as I’ve discussed at length elsewhere.)

Even setting aside the practical benefits of code reuse, I still wouldn’t agree that we should jettison micro packages. The inconvenient reality is that the Unix philosophy is just plain correct: for any given volume of code/features, it’ll be easier to reason about the system if it’s composed of many small, independent modules instead of being one massive blob. Killing our dependencies and replacing that code with our own implementation would, in many cases, just make a bad situation worse. So I reject the idea that we can “solve” this problem by picking one extreme or the other.

But I also view a naive compromise between the extremes to be a non-solution. Both extremes have real problems, but that doesn’t provide any guarantee that splitting the baby will be any better. Indeed, there’s a real risk that it’ll be worse: if you take a program that depends on 500 micro packages and re-architect it to instead depend on 200 larger packages, then you still have far too many packages to manually review and maintain. But now you are also dealing with packages that are each harder to understand when you do need to start debugging. Nice job breaking it, hero.

A less naive compromise

Having just rejected both extremes and a simple compromise, it’s clearly on me to come up with a better way to strike this balance. What we need is a way to limit the number of dependencies for any given software project without leading to a corresponding increase in the average size of each dependency. I have some ideas about how we can do so at the programming language level. (I’m going to discuss this in the context of my programming language of choice, Raku, but I believe these prescriptions to be more broadly relevant.)

I believe that a programming language/community can balance the Unix philosophy and dependency minimization by following three steps. In order from most to least fundamental, the programming language should:

  • maximize the language’s expressiveness;
  • have a great standard library; and
  • embrace a utility package (or a few utility packages).

I’ll discuss each of these in turn and then conclude with a few thoughts about more individual actions we can all take to protect our own code.

Maximize expressiveness

At first blush, “language expressiveness” may seem like an odd place to start. If the goal is to write libraries that “do just one thing”, then it seems like the number of words that it takes to implement that “one thing” shouldn’t really matter.

But what this ignores is that “one thing” is not well defined. Consider the output from the ls command.

colorized output from ls printed in three columns

Pretty much every Unix-based OS currently takes the view that ls is best thought of as “one thing”. But in that original Unix Environment paper, Pike and Kernighan argued that it is really two things: (1) listing the files and (2) formatting the output into columns. But I could see an argument for adding a third (colorizing the output) or even a fourth (determining whether the program is being run interactively or in a script).

My point isn’t that ls is “really” four things instead of one – it’s that there isn’t a single correct way to divide ls into packages that do only one thing each. Any division will inherently leave room for at least a bit of subjective judgment.

Moreover, that’s exactly what we should want: when we say “a library should do only one thing”, that’s a convenient shorthand but doesn’t need to be taken 100% literally. (Even Pike and Kernighan agree that it’s sometimes correct to add options to an existing program.) And when deciding what level of functionality to consider as “one thing”, we should (and inevitably do) consider factors such as code complexity and code length; a library that takes only 30 lines likely strikes many developers as accomplishing “one thing” in a way that the same functionality in a 3,000 line library might not.

This is especially true because one of the main reasons we want to follow the Unix philosophy is to write bug-free code – and as studies have repeatedly shown, longer code gives bugs more places to hide. This means that a library with only a few lines is much more likely to be correct – and thus can be said to better follow the Unix philosophy of doing just one thing.

As a result, the language’s overall support for writing concise, expressive code matters quite a bit. Highly expressive languages are less likely to need deep dependency graphs to keep each package to a Unix-philosophy-compliant size; packages can be “micro” in size (and complexity) without being “micro” in power. Fortunately for those of us writing Raku, it’s one of the most expressive languages, so we’re off to a strong start.

Great standard library

Next (and more obviously) you can avoid an abundance of left-pad-like micro packages by using a language with a great standard library. Standard libraries have an obvious direct impact: when a function is built into the standard library, no one needs to rely on a package that provides that function. As a concrete example: no one would ever write a left-pad package in Raku because the standard library already has sprintf and '%5s'.sprintf($str) already does the job of left-pad($str, 5).

The direct effects of the standard library are fairly limited – each standard library function can only directly replace a single micro package. Fortunately, a great standard library can have a much larger indirect effect. If a standard library has many small, composable functions that can be put together in different ways, then a vast majority of “micro packages” can be trivially replaced with a simple call to two or three standard library functions – which means that those micro packages never get written in the first place. Again, this is an area where Rakoons are in luck, since we benefit from exactly that sort of composable standard library (which was the topic of my 2021 Raku Conf talk).

One note on the subject of standard libraries: It’s very helpful for a language to have a great standard library, but that doesn’t mean it needs a huge one. It certainly doesn’t need a “batteries included” standard library – after all, when a standard library includes too many batteries, they tend to leak battery acid or at least need replacing. The difference between a “great” standard library and a “batteries included” one is that a great standard library includes all the composable functions you need to avoid left-pad-like micro packages, whereas a batteries-included standard library attempts to include non-micro packages (e.g., a web server) in the core standard library.

Utility package(s)

The final way for a language to reduce the size of dependency trees without giving up on the Unix philosophy is to collectively agree on a utility package that replaces numerous micro packages. I’ve listed this after “an expressive language” and “a great standard library” both because it’s a less ideal solution and because it can build on the foundations provided by the language and standard library.

For example, despite the somewhat harsh words I had for JavaScript earlier in this post, JS has some excellent utility packages. The current market leader, lodash, is a direct or indirect dependency of nearly 9 out of 10 JS packages and does a very good job of aggregating many common functions that might otherwise be micro packages. As large as the dependency trees are in JavaScript-land, they’d doubtless be even larger without lodash, underscore, and similar utility packages.

Lodash is, however, hamstrung a bit by JavaScript having a standard library that is (for various historical and standards-related reasons) somewhere between “small” and “non-existent”. This makes the job of a JavaScript utility library harder in two ways: first, it needs to devote a good chunk of its code just to implementing functions that would have been in a deeper standard library to begin with (or that would have been trivially derivable from standard library functions). And second, because a JavaScript utility library cannot itself use functions missing from JavaScript’s standard library, it’s limited in what it can concisely express. Despite these limitations, lodash enrichs the JavaScript ecosystem and helps to contain the explosion of micro packages.

Or does it? I can already hear some of you objecting that lodash is just a collection of independent files. You might reasonably ask whether replacing two dozen micro packages with one package consisting of two dozen files really provides much benefit. While that is a reasonable question, it also has a reasonable answer.

This sort of consolidation provides at least three benefits: First, part of the complexity from micro-package multiplicity arises from packages that approach the same basic problem from inconsistent (or just confusingly different) directions. This could be anything from wanting to be called with arguments in a different order to providing data that’s in the wrong shape for another package. In either case, the cause is the same: because each micro package was developed independently, there’s no reason for either package to fit with the other. In contrast, with a utility package there’s a guarantee that each function within the package has been designed with the goal of fitting with the other utility functions; any misfit represents a bug in the package. And even though other packages don’t have any obligation to fit with the utility package, there’s a much greater chance that they will choose to do so (assuming that, as in lodash’s case, the package genuinely is widely used) than there is that two random micro packages will be designed to work well together.

The second advantage of consolidating micro packages into a utility package is that it avoids one of the big threats of micro packages: zombie dependencies. The problem is that, because a micro package is (by definition) pretty small, it’s possible for the package maintainer to disappear without the users noticing, at least for a while. But that results in a zombie package, shambling along without anyone to fix any bugs that do come up, or to even merge any bug fixes others may submit. In the worst case, this can result in a package being left with known security vulnerabilities or even being turned over to a malicious actor. By consolidating micro packages into a utility package, you avoid this risk. (Of course, you trade it for the risk that all of the maintainers of the utility package could disappear at the same time. But for a major package like lodash, that’s both much less likely to happen and much more likely to be something you’d hear about if it did happen.)

Finally, consolidating micro packages provides a third benefit that’s cultural rather than technical – but perhaps all the more important for that reason. By keeping the total number of dependencies low, utility packages make the act of adding new dependencies more psychologically meaningful. If the codebase currently depends on two modules, then most developers will put at least a bit of thought into whether they should add third dependency. But if the codebase already has a dozen dependencies, making that a baker’s dozen is much more likely to feel like a rounding error. I’m not against people adding dependencies, but I am against them doing so thoughtlessly – so I like the idea of a utility package keeping total dependency count low enough that we’ll all think a bit more about each dependency we add.

Given these advantages, I think consolidating functionality into utility packages is a very good thing, and I think the JavaScript ecosystem is better off for the existence of packages like lodash. And I’m sad to say that the Raku ecosystem doesn’t have anything quite like that.

… or at least it doesn’t today. But keep following the Raku Advent calendar, because in part 2 of this post, I’ll be announcing a new utility package for Raku (!).

Conclusion

Both the Unix philosophy (“do one thing and do it well”) and the idea of killing your dependencies have merit – but they pull in opposite directions. They’re not 100% incompatible, but at the language level, it takes a great deal of thought to grow an ecosystem where libraries tend to follow the Unix philosophy without devolving into left-pad-like micro-package multiplicity. Three good ways to do so (again at the language level) are to prioritize language expressiveness; to have a great and composable standard library; and to embrace a utility package. Of these three, Raku currently does very well on the first two but is missing the third, at least today.

Of course, much of the job of balancing the Unix philosophy with the avoidance of left-pad cannot be handled at the collective level of the language or ecosystem – it needs to be handled at the individual level. In the code trenches, it’s always up to the author of each program to decide whether any particular dependency is worth adding or is better to rewrite (though heuristics like the ones in Surviving Software Dependencies may help). But with a bit of care at the language level, Raku can help make correctly striking this balance just a bit easier and, more importantly, just a bit more of the community norm.

Day 5 – Santa Claus is Rakuing Along

Part 1 – The Elven Journals

Prologue

A Christmas ditty sung to the tune of Santa Claus is Coming to Town:

He's making a list,
He's checking it closely,
He's gonna find out who's Rakuing mostly,
Santa Claus is Rakuing along.

Santa Claus Operations Update 2021

Santa’s operations were much improved year-over-year since his IT crew adopted Raku as their go-to programming language (see the articles from Raku Advent 2020). In addition, he and his non-techy elves found the language so easy for beginners to use, he decided to see if he could use it for managing the many task reports and other documents needed by individuals as well as managers.

Note that one of the improvements he had instituted was issuing mobile tablets to all Elves. The tablets are provisioned with Raku apps that provide a Raku REPL (Read Eval Print Loop) for easy code snippet use as well as a browser to access websites where more code can be run. As well, the tablets are equipped with a Terminal app that enables remote logins to the powerful servers in the IT Department.

He also insituted a new policy on record-keeping that requires Elves to keep a digital journal and make at least two entries per work shift. They should log in to their own account on the server via a terminal app [‘termius’ is this author’s choice] in order to make an entry or check it. When something is to be noted use vi to open their text journal (file ‘$HOME/journal’, kept safe under central version control with git) and make an entry. The first format attempted was this:

=Time 2021/4/22 1340
=Entry Designed a new toy I'll call 'Herby' the harp seal, a
stuffed animal with a waterproof covering suitable
for a bath toy for toddlers.

Note the entry is in a simple format (using pod abbreviated blocks, see https://docs.rakulang.site/language/pod#Abbreviated_blocks) to allow line parsing and further manipulation. One or more blank lines separate the new entry from any previous entry. Open the new entry with the year, month, and day number separated by slashes, one or more spaces, then the hour and minute in local time expressed in twenty-four hour format, followed by one or more blank lines. Then come the journal notes with paragraphs separated by one or more blank lines. (Of course since Raku will be processing the journals Unicode characters above the ASCII range are perfectly acceptable.)

Elves are expected to make a journal entry at the start and end of their work shift. (There is also a trial ongoing to use a voice-to-text system to ease the journaling effort, but its reliability is not very good at the moment. Also, the verbosity and silliness of the Elven people’s language makes filtering the sound quite a challenge for the IT system.)

Periodically all the journals are read and and converted to a format which makes it easier to find specific entries by date, time, and Elf. The process also creates reports detailing task progress and Elf efficiency. (Details will be in Part 2 of this article.)

The initial processing of all the Elven journals looked something like this:

use SantaClaus::Utils;
my %j; # or %journal
for each elf
%j<e> = [];
read journal file
for each line
if empty
end current object
elsif a datetime entry
create a DateTime object
...

That was not going to work–too much room for erroneous parsing and cowboy coding!

The second approach was to tightly control the Pod components allowed, extract the Pod programmatically with Raku, and fail early on problems with helpful comments. But one of Santa’s concerns was the strange nature of Pod blocks. Most have as a base class Pod::Block so those all have the following common attributes:

my class Pod::Block {
has %.config;
has @.contents;
}

Due to the @.contents array, Pod blocks can be nested infinitely deep, so handling unknown Pod can be tedious and error-prone. In addition, extracting Pod from another document requires more than a beginner’s knowledge of Raku. Thus it was decided to create a Raku journal management program which uses the Raku module Pod::Load to extract the Pod. Additionally, the current policy for journal entries precisely describes the journal entry format which can easily be extracted by Raku while reading the journal, so a helper program was added to initiate a a journal entry with a templated Pod chunk with embedded instructions to ease the Elve’s input.

Here is an example of that template block newly created at the end of the current journal file for an Elf computer user name of ‘jerzi’ and default task ID of ‘build-toy’:

Z<Edit the following Entry as necessary. Add or delete Z comments as desired.>
=begin Entry :time<2021-11-30T07:12>
Z<Enter one of ':start' or ':end' in the following config line if applicable.>
=begin Task :id<build-toy> :employee<jerzi> :status
Z<Enter notes and comments here; use blank lines to separate paragraphs>
=end Task
Z<Add another Task if applicable; ensure the ':id' is correct before doing so.>
=end Entry

The user merely adds the appropriate data and (optionally) removes the Z<> Pod comments which are ignored by Santa’s journal reader (although they leave blank spaces after Pod::Load parsing, all text paragraphs are then normalized by the reader and blank paragraphs will disappear).

Let’s see what happens when the Elf starts a brand new journal which will look like the above and then edits it to look like this:

=begin Entry :time<2021-11-30T07:12>
=begin Task :id<build-toy> :employee<jerzi> :status
=end Task
=end Entry

Now run the check:

$ check-journal
ERROR: Task id<build-toy> has a ':status' config key but no explanation

Journal entries require an explanation when it is a ‘:status’ entry without a ‘:start’ or ‘:end’ config key.

The typical Elf’s work shift now looks something like this:

  • Check messages for any special instructions
  • Login to the system and make a journal entry
    $ add-event
    See new Entry appended to file: journal
    $ vi journal
    ...
    :wq
    $ check-journal
    # make necessary edits until the journal
    # checks okay
  • Work the task(s)
  • Login and update the journal as necessary
  • Login and make the end-of-shift journal entry

Summary

The new system now creates a complete record of an Elf’s work and serves as an electronic time clock to replace the old punch cards and electro-mechanical clocks.

In addition, the individual journals provide the data for detailed operational and managerial reports for all supervisory levels.

Part 2 of this article will discuss those aspects of the new system and how Raku makes them easier to program for less experienced programmers.

Note: See all code used in this article in the repo at https://github.com/tbrowder/SantaClaus-Utils.

Santa’s Epilogue

Don’t forget the “reason for the season:” ✝

As I always end these jottings, in the words of Charles Dickens’ Tiny Tim, “may God bless Us , Every one!” [1]

Footnotes

  1. A Christmas Carol, a short story by Charles Dickens (1812-1870), a well-known and popular Victorian author whose many works include The Pickwick Papers, Oliver Twist, David Copperfield, Bleak House, Great Expectations, and A Tale of Two Cities.

Day 4 – Santa’s OCD Sorted

Santa has been around for a long time already. Santa remembers the days when bits where set by using a magnetic screwdriver! In those days, you’d made sure that things were orderly set up and sorted for quick access.

Santa likes the Raku Programming Language a lot, because it just works like Santa thinks. There’s just this one thing missing to make Santa feel at home again, just like in the olden days: an easy way to make sorted lists and easily insert new values into these lists to keep them up-to-date.

Sure, Santa knows there are hashes. And if you want to iterate over all keys alphabetically sorted, you can easily do:

  for %hash.keys.sort -> $key {
      ...
  }

or if you want both the key and the value:

  for %hash.sort(*.key) -> (:$key, :$value) {
      ...
  }

But that just feels like a lot of extra work on big hashes that were filled organically from keys and associated values.

So Santa went looking in the Raku ecosystem and was really glad when the Array::Sorted::Util distribution propped up on the search term “sort”.

So what does that do? Well, it exports a few subroutines, the most simple one is inserts. You give it an array and an object, and it will insert it in the array at the correct location to keep the array sorted:

  use Array::Sorted::Util;

  my str @names;
  inserts(@names,$_) for <Zaphod Arthur Ford>;
  say @names; # [Arthur Ford Zaphod]

But what if you would have a list of Pairs with names and gifts? You’d need two arrays, and they would have to be kept in sync! Well, that is also easily be possible with inserts Santa found out:

  use Array::Sorted::Util;

  my str @names;
  my str @gifts;
  for Zaphod => 'arm', Arthur => 'tea', Ford => 'blanket' {
      inserts(@names, .key, @gifts, .value);
  }
  say @names;  # [Arthur Ford Zaphod]
  say @gifts;  # [tea blanket arm]

And then, if you want to look up the gift of a specific person, you’d use finds:

  say @gifts[$_] with finds(@names, 'Arthur');  # tea
  say @gifts[$_] with finds(@names, 'Marvin');  # (no output)

So Santa made changes to the code to use two lists instead of a hash. But the elves really didn’t like that. So they went searching in the Raku ecosystem as well, and found Array::Sorted::Map. This allowed them to easily apply a Map interface to the two lists:

  use Array::Sorted::Util;
  my str @names;
  my str @gifts;
  for Zaphod => 'arm', Arthur => 'tea', Ford => 'blanket' {
      inserts(@names, .key, @gifts, .value);
  }

  use Array::Sorted::Map;
  my %gifts := Array::Sorted::Map.new(
    keys => @names, values => @gifts
  );
  .say with %gifts<Arthur>;  # tea
  .say with %gifts<Marvin>;  # (no output)

That was good as a temporary measure. But it still wouldn’t allow the elves to make changes to the hash without having to resort to things like findsinserts or deletes operating on the underlying arrays.

The wise Santa realized that the elves have the future, so it was important to find a way that elves as well as Santa would be happy with. A further search revealed the existence of a Hash::Sorted module, which promised to keep the keys of a hash created with that class, to always be in sorted order.

When Santa proposed to have the elves use that module, they were very glad. Now the elves could use the familiar hash idioms and satisfy Santa’s need for order:

  use Hash::Sorted;

  my %gifts is Hash::Sorted;
  my @names := %hash.keys;
  my @gifts := %hash.values;

  %gifts = Zaphod => 'arm',
           Arthur => 'tea',
           Ford =>   'blanket';
  say @names;  # [Arthur Ford Zaphod]
  say @gifts;  # [tea blanket arm]

  .say with %gifts<Arthur>;  # tea
  .say with %gifts<Marvin>;  # (no output)

What the elves didn’t realize, was that the Hash::Sorted module is just a frontend for the subroutines provided by Array::Sorted::Util, and the Hash::Agnostic role. But Santa wisely didn’t tell that to the elves.

And all was well on the North Pole!

Day 3 – Silently

Santa was working on some programs to handle all of the intricacies of modern-day just-in-time package delivering, and got annoyed by some parts of the program getting noisy because some elf had left some debug statements in there. Ah, the joys of collaboration!

So Santa wondered whether there could be a way to be less distracted by what otherwise seemed to be a perfectly running program. Looking at the Wonderful Winter Raku Land, after a little bit of searching, Santa found the silently module. That was great! It’s a module that exports a single subroutine silently that takes a block to execute, and will capture all output made by the code running in that block.

Whereas Santa would first do:

    assign-optimal-trajectory(@gifts);

and get a lot of unwanted output, now Santa could just do:

    silently { assign-optimal-trajectory(@gifts) }

and get the same result without so much noise.

But alas, just before all gifts where on their way, it turned out that some gifts had somehow been lost, or at least not assigned a proper trajectory. Now, Santa had the option of running the same program again, but with all of the noise. And time was getting short! But then Santa realized that if something had gone wrong, there would be an error message on STDERR.

And guess what, the silently module actually only muffles whatever noise was generated! After running your code, you can still find out the noise it made on STDERR, because silently returns an object that you can call the .err method on to get all the text that was sent to STDERR. So the code became:

    my $muffled = silently {
        assign-optimal-trajectory(@gifts)
    }
    if $muffled.err -> $errors {
        say $errors;
    }

This allowed Santa to quickly find and fix the problem for the gifts that had gone wrong. And Christmas was saved once again!

Later, some elves were reprimanded for leaving debug statements in production code. They promised to not do it again.

Day 2 – Rotation of Log files in a nutshell

Santa has a cloud-based application that helps him to deliver the gifts to the children. Once the gifts have been delivered Santa registers the delivery operation through the deliveries.log file. Just after the inspector elves review this log file comparing it with the list of children to ensure that all the children have received correctly their gifts.

The number of deliveries is very large and the price of the cloud-based storage too. To accomplish the cloud budget it’s neccesary to set a maximum log capacity such as:

  • The log information will be distributed in 5 log files
  • Each log file size should be about 20 MB

How will we do?

Santa needs a process that will run with regular basis. This process will rotate the log files if the size of the main log file is almost 20 megabytes. The maximum number of log files will be 5, that is:

  • deliveries.log
  • deliveries.log1
  • deliveries.log2
  • deliveries.log3
  • deliveries.log4

When deliveries.log file size be almost 20 megabytes its name will changes to deliveries.log1.

Next time this happens, the name of the deliveries.log1 will changes to deliveries.log2 and the name of deliveries.log will changes to deliveries.log1.

And so on until reach to deliveries.log4. At this point, the deliveries.log4 file (the oldest) will be deleted because there are 5 log files, and the name of the deliveries.log3 file will changes to deliveries.log4.

Starting the Raku script to get it

First we need know the path of the log files, the name of the main log file and its absolute path:

my $path_logs     = '/var/log/gifts/';
my $main_log      = 'deliveries.log';
my $path_main_log = "$path_logs$main_log";

As we see in "$path_logs$main_log", using double quotes to concatenate strings is very visual, but a more elegant approach is achieved using ~ operator between the strings variables:

my $path_main_log = $path_logs ~ $main_log;

Let’s keep going setting the maximum size of the main log file (20 megabytes) in bytes:

my $max_size_log = 20000000;

Also, we need set the maximum number of log files that we will rotate:

my $max_logs = 5;

We have already the ingredients, now we are going to set the requirements.

Requirements

Now we need to know if the main log file deliveries.log exists and to check its size.

The Raku capacity to handle files with .IO is very robust, concise and complete. Also, we can use the method .e next to a absolute file path to check if it exists, and exit the script if the file doesn’t exist. All in one line:

exit unless $path_main_log.IO.e;

Similarly using the .s method we can get the size of a given file. We can exit if the main log size is lower than the specified in $max_size_log:

exit if $path_main_log.IO.s < $max_size_log;

As we see, this Raku code is very readable and concise, and makes it easier to read in case of debugging.

At this point, the deliveries.log file (the main log file) exists with a size of 20 megabytes or more. The next step is to find out how many log files exist in the logs folder.

How many log files we have?

We can populate an array with the absolute paths of the log files using chaining methods. Chaining methods are part of the Raku functional programming paradigm, a powerful feature that pass the result of a method to another as an argument, in similar way like pipes in bash shell:

my @log_files = $path_logs.IO.dir.grep(/$main_log$ | $main_log\d+$/).sort;

Let’s dissect the sneak:

  • @log_files is an array that will populates with the absolute paths of the log files.

  • First we use $path_logs whose value is /var/log/gifts/, that is the absolute path of the logs folder. The next methods will search the log files in this location.

  • The .IO.dir method returns all the absolute paths of the files and folders located in the $path_logs folder. As curiosity, the dir command was already used by the RT-11 operating system 51 years ago. Later, this command was also adopted by CP/M and MS-DOS. There are things that will never change.

  • The next method is .grep(/$main_log$ | $main_log\d+$/). The .grep method matches the regular expression pattern provided as parameter /$main_log$ | $main_log\d+$/ over each returned value of the previous method. This pattern admits two possibilities: matches the main log file $main_log$ that is deliveries.log or (|) matches the main log file $main_log next a number $main_log\d+$. This regex matches the absolute paths of deliveries.log or deliveries.log1 or deliveries.log4 log files but never matches strings like deliveries.log.old.

  • The last method is .sort. This method sorts the elements returned by the previous method. This ordering is essential to continue with the next operations.

Many operations in one line, with chaining methods that’s be fine.

There are many log files, let’s fire the last

If the number of the log files has reached to the limit established in $max_logs we need to remove the last log file deliveries.log4 in our case, both in the array @log_files and in the filesystem. Your attention please:

@log_files.pop.unlink if @log_files.elems == $max_logs;

In this case we are also use chaining methods:

  • @log_files is the array with the absolute path of each log file.

  • The .pop method removes and returns the last element from @log_files array. This element is the absolute path of deliveries.log4 file.

  • The .unlink method removes a file in the filesystem. In this case removes the returned element by the previous method .pop, that is the log file deliveries.log4.

Then comes the condition with if although it is evaluated before. This condition compares the number of elements of @log_files using the .elems method with $max_logs whose value is 5. If the condition returns true the deliveries.log4 element is fired, both from the @log_files array and from the filesystem.

The use of the methods .pop and .unlink seems amazing to me.

Moving log files to the right position

Now, we need to rename the log files as we seen before.

Here, the Raku magic help us with the for iterator:

for @log_files.kv.reverse -> $file, $idx { $file.rename($path_main_log ~ $idx + 1); }

The -> $file, idx is the signature of the block {} and is made up of the $file and $idx parameters that are populated by the .kv method. Each iteration provides the absolute path of the current log file in the $file variable and the current index iteration in the $idx variable. All this is done in reverse order using the .reverse method.

The content of the block {} simply use the .rename method with the current log file $file to change its name to the name of the next log file. The name of the next log file is the main log file $path_main_log plus the next index number ~ $idx + 1.

Beautiful one line code running many operations. A scripter dream.

Putting all together

my $path_logs     = '/var/log/gifts/';
my $main_log      = 'deliveries.log';
my $path_main_log = $path_logs ~ $main_log;
my $max_size_log  = 20000000;
my $max_logs      = 5;

exit unless $path_main_log.IO.e;
exit if $path_main_log.IO.s < $max_size_log;

my @log_files = $path_logs.IO.dir.grep(/$main_log$ | $main_log\d+$/).sort;

@log_files.pop.unlink if @log_files.elems == $max_logs;

for @log_files.kv.reverse -> $file, $idx { $file.rename($path_main_log ~ $idx + 1); }

Epilogue

The multiparadigm approach of Raku provides an astonishing capacity to perform complex operations concisely and pushes scripting to the next level, and Santa knows it.

Day 1 – Batteries Included: Generating Thumbnails

It was a cold wintry night in the North Pole and Santa was in a mood.

“Naughty. Naughty. Naughty. Ni..aughty” he grumbled, checking his list. Then checking it again.

“Everything ok?” chipped cheerful Sparkleface the elf, bouncing into the room. “Isn’t it nice to have some cold weather for a change?”

Santa scowled at Sparkleface with an icy stare that froze all the water molecules in the room. He said nothing, gazing through Sparkleface into some distant place in another dimension.

Undeterred, Sparkleface continued: “did you see all those wonderful images we’ve received from the children of the world who are looking forward to the holiday, and have been sending us pictures of what they want for Christmas? Isn’t it great that everyone has cell phones these days and can so easily send us high resolution images instead of writing out lists by hand like in the olden days?”

This finally provoked a reply. “No. It is not great.”

Sparkleface started to say something when Santa lowered his voice
and continued —

“Let me tell you precisely why it is not great”

and began to Santasplain the situation:

“There are 2.2 billion children in the world, and so far we have received images from 90% of them. So that’s 1,980,000,000 images. Yes, most of them are high resolution — too high — many are over 20 megabytes each! and we need to email them out to our distributed team of elves! That’s too much data! We need to downscale them all into lower resolution versions before emailing them out. In other words, we need to convert them all into thumbnails.”

“So why don’t you write a script?” Sparkleface let the words come out before realizing the effect they would have.

“First of all” — Santa’s voice began to get louder — “in case you didn’t know, I am 1,750 years old. In fact, this year was my 1,750th birthday. Do you know how many times I have written a script in my lifetime to convert a directory of images into thumbnails?”

“Well…computers have only been around since…”

“Sixty three times. Here at the North Pole we have been using sophisticated technology since way before it was popular. We have always been early adopters — always needing the latest tech in order to help meet increasing demand — and have always had images of toys around, to keep track of things. And every year for some reason I need a new script for generating thumbnails — whether it’s for new image formats, compression schemes, or new use cases. This is the first year that it’s been because of high resolution images coming from cell phones, that we need to downscale.”

Santa continued: “Thumbnail generation scripts are tedious to write.” and listed the features that he wanted for this year’s script:

  • the ability to use all of the CPUs available — scaling images in parallel to maximize our resource usage, and minimize the time it takes
  • an accurate count of the number of images successfully scaled
  • a way to do a dry-run
  • some sort of verbose option to see what the program is doing
  • a way to re-run the program and force overwriting of previous images
  • some kind of documentation

Sparkleface smiled wryly, and sat down at Santa’s keyboard. “Give me a minute.”

Sparkleface then typed out a program faster than you can sing “Rudolph the Red Nosed Reindeer.”

“Okay try it, this script is called sparkle-sizer. `sparkle-sizer -h” will show the options”.

Santa tried it:

$ sparkle-sizer -h
Usage:
  ./sparkle-sizer [-n] [-v] [--force] [--degree[=Int]] [<dir>]

    -n                dry run
    -v                verbose
    --force           force regeneration of existing thumbnails
    --degree[=Int]    degree of parallelism

Santa ran it with --degree=12 and watched with glee as the output of htop showed the CPU usage:

“Not bad” he said. It’s using all of the CPUs! And he proceeded to look at the source code:

#!/usr/bin/env raku

sub mk-thumb(IO::Path $src, Bool :$force) {
  my $dst = "{$src.dirname}/thumb-{$src.basename}";
  return False if $dst.IO.e && !$force;
  (shell "gm convert -auto-orient $src -thumbnail '400x400>' $dst") == 0;
}

multi files(IO::Path $f where *.f) { $f }
multi files(IO::Path $f where *.d) {
  my @ls = $f.IO.dir(
      test => { not .starts-with('thumb' | '.' ) }
     );
  @ls.map: { |files($_) }
}

multi MAIN($dir = "photos",
  Bool :$n,          #= dry run
  Bool :$v,          #= verbose
  Bool :$force,      #= force regenerate existing thumbnails
  Int  :$degree = 4, #= degree of parallelism
) {

  say "dry run!" if $n;
  &shell.wrap: -> |c { note c.raku if $n || $v; callsame unless $n }

  my atomicint $converted = 0;
  my atomicint $considered = 0;

  files($dir.IO).race(:$degree).map: -> $f {
    with ++⚛$considered {
      note "$_ files considered" if $_ %% 100;
    }
    ++⚛$converted if mk-thumb($f, :$force);
  }

  say "converted $converted out of $considered";
}

“No dependencies?” asked Santa. “How does it work? Tell me more.”

Now it was Sparkleface’s turn to explain:

The beginning, mk-thumb, converts a single image into a thumbnail using “shell“. Raku has gradual typing, so we can enforce type constraints. Also if $force is true, we overwrite any old files.

Then we have “files“, to recursively traverse a directory. Note that Raku has multi dispatch — and the extra constraints on the arguments allow us to write make a sort of Prolog-style directory traversing set of routines.

We then have MAIN which declares the main program. Named arguments become command-line arguments, and comments are printed out as a help message.

We use “wrap” to get dry-run capability — wrapping the “shell” built-in — printing the arguments when we are verbose or doing a dry-run, and only calling the real “shell” for the non-dry-run case.

We have two counters for keeping track of files we’ve considered and converted. We use an atomic int so we can increment it from various threads and not worry about race conditions.

We then use “race” to run the conversion in batches across multiple threads. Note the atomic operators which allow us to increment our native variables and not worry about race conditions.

“Atomic operators” Santa said. “I thought they were snowflakes.” Santa’s mood had lightened.

While running the program, Sparkleface thought he could see a little twinkle in Santa’s eyes and had hope that Santa now had one less thing to worry about before the big night.