Day 11 – Unix philosophy without left-pad, Part 2: Minimizing dependencies with a utilities package

A textual depiction of Camelia, the Raku mascot with the middle character underlined

In my previous post, I made the case that programming languages use a utility library to provide small-but-commonly-needed functions. Today I’m introducing a new module that I hope will play this role for Raku.

In this post, I’ll introduce you to this new package as it exists today. Next, I’ll turn to plans for the future and how I’d like to see a Raku utility package grow over time. Then we’ll wrap up by taking a step back and discussing how all of this fits with the Unix philosophy.

`_`

The name

First of all, the name: the utility package I’ve released is named _ (pronounced/also known as lowbar just like in the HTML spec). I recognize that most people will think that this name is a nod to JavaScript’s underscore and lodash libraries, but it’s not really meant as one. Lodash is a good library, but its goals/contents are different enough that I don’t feel any desire to reference Lodash with _’s name

Instead, the name _ just falls naturally out of Raku’s topic variables: in Raku, if you want to refer to the current topic without giving it a specific name, you use the appropriate sigil followed by _: $_ @_, or %_. So it’s only natural that a utilities package – which, by its nature, can’t have a particularly descriptive name – would use _ (in this case without a sigil, since modules/packages don’t use sigils).

Besides, if it wasn’t named _, the other obvious name would be util – but that name is more-or-less occupied by a fellow Rakoon. And, finally, this name gives us the helpfully short use statement of use _ – a nice feature for quick prototyping if _ ends up being widely used. (For production use, you might want to qualify that use statement, as I’ll discuss later in this post. But starting with a short name is also helpful if our fully qualified use statement risks getting a bit long).

The goal

_’s purpose is to be a meta utility package that lets Raku programs avoid rewriting the same helper functions without embracing the excessive use of micro packages in the Raku ecosystem. When I say that _ is a meta utility package, I mean something analogous to the idea of a Linux distro metapackage. Specifically, I mean that unlike many utility packages, _ is comprised of individual sub-packages. Each sub-package has its own documentation/tests and is an independent unit. My intent is that you can read the README for a _ sub-package and then use (and fully understand!) that sub-package without needing to know anything about any other _ sub-package.

Additionally, every sub-package in_ makes three promises:

To have zero dependencies (with a grudging exception for _ files or Core modules)
To have all its code in a single file (not counting tests/docs)
To keep that file to 70 or fewer lines

If you have a package or script that meets those requirements and that you’d like to include, please feel free to open a PR. (Or even if it slightly exceeds the requirements; I’m willing to talk about how flexible _ will be.)

Why those rules?

These rules might strike some of you as a bit odd. In particular, why is _ so focused on keeping the total code size down? I talked a bit about the value of reducing lines of code a bit in the previous post, but I know that not everyone was convinced. And it is a reasonable question – if taken too far, writing concise code can reduce readability, which is rather the opposite of our goal.

Here’s the answer: _ packages are short so you can fully understand them. And, by understanding them, trust them.

My goal for _ it that anyone fluent in Raku can open file for a _ sub-package, read the code on their screen, and see 100% of the functionality that package implements. (That’s where the “70 lines” limit comes from – it’s my best guess for the number of lines that can fit on a typical screen.) Getting this global view will give you a very different level of confidence than we typically get from software – or at least that’s my hope.

I believe that _ can provide this much-higher-than-normal level of confidence because the three rules above cut sub-packages off from something our profession is absolutely enamored with: black box abstraction. The idea of black box abstraction is that you can implement some complex functionality, box it up, and expose it to the outside world so carefully that the world can totally ignore the implementation details and can care only about the inputs and outputs.

An image depicting a function as a box that takes apples as input and produces bananas as output — apples go in, bananas come out. You can’t explain that!

Now, don’t misunderstand me: I fully agree that it’s a phenomenally powerful tool. Without black box abstraction, there’s simply no way that the vast majority of software in use today – including Raku – would be remotely possible. Indeed, Raku makes great use of abstraction and I’m looking forward to the whole new set of abstractions that Jonathan Worthington’s work on the Raku AST seems poised to deliver soon-ish.

As programmers like to say, there’s no problem that can’t be solved by adding another layer of abstraction – and, as a profession, we sure have solved a lot of problems. But we’ve also created a lot of problems. And I think one reason we’ve created so many is that we often reach for black box abstraction too quickly, without putting enough consideration into the not-inconsiderable costs of additional abstraction.

In particular, whenever code depends on a black box, that means that the author of that code chose to rely on code that, by design, they didn’t need to understand. And they’re claiming that you also don’t need to understand that black-box code. But that means that you can never fully understand the code you’re currently reading either; the very best you can do is reach a partial understanding subject to the disclaimer “assuming both that I correctly understood the black box’s promises and that the black box keeps all its promises”.

The slight flaw is that black boxes never keep all their promises. Well, OK, that might be too strong; software sometimes works. But, at the least, you can never guarantee that any black box will keep any particular promise. As a result, anyone who relies on a black box will, sooner or later, need to open that box up and debug the tangled wires inside. And, as anyone who has ever followed a deep callstack can attest, that often means discovering all the various black boxes nested inside the first box and getting to play with your very own set of software matryoshka dolls.
$A meme-style image of a set of matryoshka dolls with the caption "RUSSIAN DOLLS \n So full of themselves"$
In fact, you could say that the whole reason for _ is that we can’t fully trust black box abstraction. If we had a way to guarantee that our black boxes would Just Work™, then having thousands of dependencies would hardly be a problem at all. But, since that’s one thing we cannot guarantee, _ is deeply committed to not adding additional abstraction. And thus, _ packages follow the three rules above, and present their entire codebase – dependencies and all – for you to view at once, on a single screen.

When viewing a _ sub-package, you can look at a single file and, without needing any outside context or info, see whether the code in that file is correct. After all, outside the fairly limited domain of formal methods, pretty much no software is provably correct. But, if we can make the code short and readable enough, maybe we can at least reach what mathematicians (jokingly) call “proof by inspection“: something so simple, that we can tell that it’s correct just by looking at it.

Or at least that’s the theory. But, as Knuth famously reminds us, don’t trust code if you’ve only proved it true, not run it. I’m sure this advice covers proof-by-inspection at least as much as proof by any other method and, accordingly, _’s brevity hasn’t stopped me from adding significant numbers of unit tests.

Even with the tests, and even as simple as each sub-package is, I’m sure that _ still contains plenty of bugs – probably far more than I’d like it to. But hopefully, the lack of abstraction in each of _’s sub-packages also means that we’ll all be able to more easily debug any issues that we encounter: doing so won’t require anyone to understand any code or systems outside of a single, short file. In other words, to borrow a phrase from Aaron Hsu, _ embraces “transparency over abstraction”.

The scope

The rules we just talked about limit on what sub-packages _ can include – but these rules don’t indicate what _ should include. Let’s address that now.

_’s scope is easy enough to state in broad terms: _ should include a package if that package follows our rules and provides functionality that many Raku packages would get utility from. (That’s what “utility package” means!) If the package’s scope is small enough that it can be implemented in 70 lines, then having it as an independent package would create a micro package; if that micro package would be useful in a bunch of Raku programs, then it would likely become a widely used micro package. Since _’s goal is to limit the number of widely depended-on micro-packages in the Raku ecosystem, any package that meets these two criteria is a good candidate for _.

But all that basically boils down to “_ should include packages that are (1) small and (2) useful”. While I doubt that many of you will disagree, knowing what “useful” means in practice is the hard part.

And I’m not entirely sure what exact view of usefulness will be turn out to be the best fit for _. I do know that many utility libraries implement basic helper functions – reverse, zipWith, sortBy, etc – that wouldn’t have any utility in _ because they’re either already built in to Raku or a trivial combination of Raku builtins. So _ can and should include higher-level utilities; I guess that we’ll have to discover together what exactly that looks like. If you have ideas for _ packages, please let me know – or, even better, submit a PR!

That said, I do have three general categories of packages that be good fit for _:

Code that should be in Raku’s standard library (one day)

In addition to reducing the pressure for micro packages,_ can also help Rakoons to test out packages that might one day belong in Raku but that need a bit more user feedback/time to bake before Raku commits to adding them (and the fairly strong backwards compatibility guarantee entailed by inclusion in Raku itself). Raku’s use experimental pragma already fills a part of this role , but _ could provide a good home for packages that are a bit too experimental even for that pragma.

Code that ought to stay out of Raku’s standard library

There are some small packages that we can reasonably expect to be widely used but that, for one reason or another, aren’t a good fit for Raku’s standard library; _ can provide a home for those. Just as the packages in the first category share a lot with packages behind the use experimental pragma, this category shares a lot with Raku’s Core modules. And, again, _’s role could be a testing ground of sorts for modules that might one day graduate to being added as a Core module. (Though of course most packages won’t and shouldn’t “graduate” in this sense: I don’t want to suggest that being a _ sub-package is or should be a temporary status. The vast majority of _ sub packages will stay _ sub-packages, which is exactly as it should be.

Code that is already in Raku(do)’s standard library but that we shouldn’t use

Raku and Rakudo both quite correctly make fairly strong guarantees about not breaking spec’d code. But, in return, it asks us to not rely on code outside that guarantee – that is, not to rely on implementation details. Unfortunately Rakoons, just like everybody else, are fairly rubbish at keeping up our end of that bargain – it’s all too easy have thoughts like:

well, this function is already installed and does just what I need. And it’s in Rakudo, so I know it’s decently well-written. So what if it’s marked with is implementation-detail, I’m sure it’ll be fine.

I’m not judging those thoughts too harshly – I’ve had them myself – but the fact is that it’s not fine. When we, as a community, ignore signposts like is implementation-detail, the inevitable negative result is that we force Rakudo developers to chose between not changing the implementation detail or breaking user code. Even if they’re “allowed” to break that code under the terms of the agreement (the one that we users are ignoring by relying on the code!), none of the Rakudo devs enjoy breaking things.

If changing an implementation-detail makes blin runs start failing, then devs will think twice about that change – even if it’s a good change. What’s worse is that the (totally understandable!) desire to discourage users from depending on implementation details risks tempting Rakudo devs to avoid fully documenting those details – which creates/exacerbates the problem of tacit knowledge (sometimes called “tribal knowledge“) – the knowledge possessed by many people in the community, but which isn’t written down or otherwise accessible to new people. Tacit knowledge, in turn, creates barriers to new developers looking to understand how to improve the Raku’s main language implementation, which hurts everyone. Accordingly, one additional goal for _ is to head this problem off by providing alternatives to any of Rakudo’s implementation-details that developers might be tempted to depend on.

So, let’s see: a package is a good fit for _ if it

should be in the standard library but isn’t
shouldn’t be in the standard library
or is in the standard library but shouldn’t be used.

I think that covers all possible packages except for those that are in the standard library and should be used, so I’m not sure we really managed to narrow it down! But maybe taking a look at _’s initial packages will provide some examples of packages that, at least in my view, were worth including.

Current status

As of today (December 11, 2021), _ includes 7 sub-packages and is beta software. _’s source code and documentation are on GitHub and _ itself can be installed via Zef with the command:
zef install '_:ver<0.0.1>:auth<zef:codesections>'
The beta period will be fairly brief, but will last long enough to get initial feedback on the existing functions/APIs.

During this beta period, _ explicitly makes no guarantees about backwards compatibility. In particular, ensuring that _ is strongly backwards compatible once promises to be may require breaking changes to every _ when 1.0.0 version is released. Because backwards compatibility is very important for a package like _, my goal is to reach 1.0.0 as soon as possible, with an exact date depending in part on what approach _ takes to compatibility (more on that below – and, as you’ll see, it’s pretty likely that _’s 1.0.0 version won’t actually be called “1.0.0”).

sub-packages

Currently _ includes the following sub-packages. You can find more info and usage examples for each sub-package in its README file, linked from its name.

Pattern::Match – provides a choose function that enables pattern matching using Raku’s signature destructuring as an alternativegiven/when. choose lets you bind variables to elements of the match, supports placeholders and literals, and can detect unreachable/shaddowed patterns. (Fun fact: musing about a function like choose but not wanting to create a micro package is what first started me on the trail towards _). [source code]
Print::Dbg – provides a dbg function designed to support more ergonomic print-debugging (compared to Rakudo’s dd). dbg accepts any number of arguments and return the same values (i.e., effectively a no-op). As a side effect, dbg prints (to stderr) the file and line on which it was invoked and a .raku representation of each argument; if any of those arguments are variables, dbg prints the variable name. Because dbg returns the values it was passed, you can use it to add debugging code without altering the behavior of the code being debugged. An example: my $new-var = $old1 + dbg($old2) + $old2. dbg was inspired by Rust’s dbg! macro. Compare with guifa’s Debug::Transput, which provides similar functionality. [source code]
Self::Recursion – provides &_ as an alias for &?ROUTINE and thus provides a “topic function” that allows for convenient self-recursion. Compare with APL’s ∇ function. [source code]
Text::Paragraphs – provides a paragraphs function analogous to Raku’s lines: that is, it splits a Str or the contents of a file into paragraphs. It can detect paragraphs that are separated by blank lines and/or paragraphs that are marked by first-line indentation. It is also able to distinguish between the start of a new paragraph and the a bulleted or numbered list (which is not a new paragraph). [source code]
Text::Wrap – provides wrap-words, a replacement for the Rakudo implementation-detail method Str.naive-word-wrapper. wrap-words is slightly less naive because it provides basic support for wide Unicode (supporting character width without knowing the font is impossible in theory but works OK in practice). Additionallly, wrap-words respects the existing whitespace in between words so, unlike Rakudo’s version, it doesn’t need to have an opinion about how many spaces to put after a period (though, for the record, Rakudo’s view that periods should be followed by two spaces is the correct one). wrap-words uses the same greedy wrapping algorithm as Rakudo (if anyone is up for a challenge, I’d welcome a PR that implements the Knuth & Plass line-breaking algorithm … in under 70 lines of code – here’s a JS implementation in only ~300 lines to get you started)! [source code]
Test::Doctest::Markdown – provides a doctest funtion that tests Raku code contained in a Markdown file with the goal of testing example code in a README or other documentation. (Nothing’s worse than broken examples!) doctest tests each code block as follows: If the code block has OUTPUT: «…» comments, doctest tests the code’s output against the expected output; if the code block doesn’t have OUTPUT comments, doctest tests whether the code can be EVALed ok. doctest also supports adding configuration info by preceding the code block with a  comment; currently, the only config option is to provide setup code that’s run as part of the test without being displayed in the Markdown file. Inspired by Rust’s documentation tests. [source code]
Test::Fluent – provides a thin wrapper over Raku’s Core Test module that supports testing in a more fluent style as shown in the example below. Most notably, this style supports providing test descriptions in pod6 declarator comments. Inspired by the Fluent Assertions (.NET’s) and Chai (JS) packages. [source code]

# with Raku's Test:
unlike escape-str($str), /<invalid-chars>/, 
    "Escaped strings don't contain invalid characters";

# with Test::Fluent:
#| Escaped strings don't contain invalid characters
escape-str($str).is.not.like: /<invalid-chars>/;

sub-package selection

As I mentioned earlier, you can import all of _’s sub-packages with use _. This imports all the non-test functions; to import the test functions, pass ‘Test’ as a named parameter: use _ :Test or import both test and non-test functions with use _ :ALL. If you would like more control over the imports, you can pass a list of the specific functions you’d like. For example, to import only the two text-processing functions, you would write use _ <&paragraphs &wrap-words>.

Future plans/questions

I have one medium-term goal for _ that I’d like to take care of before a stable release. I also have several questions I’m pondering (thoughts/ideas appreciated!). And, of course, I’d like to keep building out the functionality and robustness (more tests!) of the existing sub-packages.

Versioning

The goal – and the largest blocker for a 1.0.0 release for _ – is to figure out the best way for _ to version sub-packages and to implement that a versioning system.

I’m still in the design phase for this part of _, but I’m optimistic.

Raku offers nearly unique opportunity to get versioning right. With the exception of the in-alpha-testing language Unison, I’m not aware of any language that supports versioning as a first-class concept to the degree that Raku does; Raku goes so far as letting us set both version and api info for nearly every language construct. Even better, Raku’s strong support for multiple dispatch lets us “grow” Raku functions without breaking them: when we define a new multi candidate with a narrower signature, we add something new without breaking any existing calls. (I’m using “grow” in the sense Rich Hickey introduced – you grow a function by either requiring less from or providing more to that function’s callers).

Given all these advantages, I’m hoping that _ 1.0.0 will manage versions in a way that gives users fine-grained control over which version of a _ function they use – but where exercising that control is largely optional for most users because nothing ever breaks.

But will that 1.0.0 release actually be a “1.0.0” release? I have always used semantic versioning and think it’s a useful tool for communicating changes to users. That said, it’s also true that semver has real problems. In particular, it seems like _ nature – a collection of independent sub-packages, in which changes to one sub-package have no effect on any other – might not be a great fit for the binary nature of semver.

Consequently, I’m strongly considering calendar versioning (calver) or some other non-semver versioning scheme.

At the very least, having periodic scheduled _ releases would provide a natural way to bundle sub-package fixes. It might even make sense to track Raku’s version and backwards compatibility stance (which would mean not allowing any breaking (non-growing) changes except for when a new Raku language version is released).

I also want to put some thought into letting users select a version at the sub-package (or even function) level. One of the advantages of a meta package like _ is that it bundles administrative issues like upgrading, so I don’t want to set anything up that would add work for users. At the same time, giving users more control would be a great feature. There’s also the question of what version to provide when users don’t specify an exact version: I don’t want to reinvent the wheel or to be needlessly inconsistent with zef, but the arguments for/against golang-style minimal version selection have me intrigued (especially the ones Russ Cox raised in this 2018 talk).

Finally, _ needs to have a decent responsible disclosure process before it’s ready for a stable release (maybe that’s not technically a “versioning” issue, but it’s close enough; a security bug would certainly lead to a new version!). The inherent simplicity of _ sub-packages should make security flaws much less likely – but that phrase has “famous last words” written all over it, so _ will definitely err on the side of caution. I don’t think there’s a whole lot to decide here; it’s just a matter of setting it up.

So, lots to think about, several decisions to make, and some implementation code to write.

Packages that outgrow `_`

Another question I’m mulling over is how _ should act when a package is removed from _. This seems like something that could happen because the package adds enough features that it can’t fit in 70 lines without sacrificing clarity – in which case it makes sense for the sub-package to spin off into a full package of its own. Or a sub-package might leave _ because it “graduates” into Raku’s standard library/Core modules. (Or a package could be removed because it was a bad idea in the first place, but that’s hopefully rare and can be handled as a normal deprecation/breaking change).

If a sub-package is removed from _ and a user tries to use one a function from that sub-package then, unless we handle that as a special case, the user would get an error. So the question is if we want to add any special logic for removed packages. If that sub-package still exists but just lives elsewhere, then _ could import it as a dependency and re-export it as a sub-package. This would prevent needlessly breaking user code but would mean that someone could believe that they were use a _ sub-package but actually be using an external package – which risks drastically weakening _’s guarantees (and we’d no longer have 0 dependencies).

I suspect that the best answer here is to re-export the old packages but to throw an is DEPRECATED warning. But I’d put bit more thought into whether there’s an alternative that would avoid the dependency.

Micro packages already in the ecosystem

Next, I’d like to put some thought into how (if at all) _ should approach existing micro packages in the Raku ecosystem. For the initial packages in _, I focused entirely on preventing new micro packages from becoming widely used dependencies. In particular, I avoided knowingly duplicating any existing Raku packages (well, with the slight exception of guifa’s Debug::Transput, but that, as that package notes in its README, Debug::Transput was based off an idea I mentioned to guifa on IRC).

But it might make sense for _ to one day include the code from Raku packages (or slightly modified versions of them). To keep its guarantees, _ would need to create a sub-package based on the package’s code, i.e. fork the package. I would want to be very careful about this – even though forking and re-distributing a free software package is entirely allowed by the license, it can sometimes come off as a bit rude. And I don’t want anyone to think of _ as a package that’s interested in taking credit for other people’s work.

Despite those reservations, there’s one really compelling reason to consider forking packages: _’s purpose is to reduce the number of widely-used micro-package dependencies in the Raku ecosystem, and there’s no better way to do that than to find packages that are already widely used micro packages. (Or, said differently, to find packages that are furthest upstream in the Raku River.) And, fortunately, this sort of hard data for the Raku ecosystem is easy to get, either directly through zef or using the ModuleCitation module to generate a visual/interactive display similar to the Raku Ecosystem Citation Index:

This sort of info would let us find modules that are small and widely depended on; in short, ones that are perfect candidates for adding to _.

Given that we can do this, I’d like to put some thought into whether we should and, if so, how to best do so. I can see a few options: we could look for modules with a high citation index that are also good targets for re-writing (perhaps because they were written some time ago or with a different goal) and create sub-packages based on those (without forking them). Or we could look for packages that might be abandoned and, if so, fork them as sub-packages. Or try to work with package maintainers to have them add (a version of) the package to _. And I’m sure there are other approaches too; something else to ponder.

Making `_` trustworthy

Finally, I’ve been musing about how _ can be as trustworthy as possible (even before the subject came up last time). The goal, of course, is for _ to be as close as possible to zero trust: because each sub-package is a single short, readable (I hope!) file with zero dependencies, you shouldn’t need to trust me – just read the code (and tests) and see for yourself.

That’s a fine theory, but in practice there’s still a big difference between “as close as possible to zero trust” and “actually zero trust”. And it’s true that at least some aspects of _ depend on its maintainers (i.e., right now, me) being trustworthy. That’s great if you trust me – which I’m kind of stuck with anyway! For the not-me people in the world, I hope earned the trust of many in the Raku community, but I can fully understand anyone who doesn’t share that trust, and I’d like to put some thought into the best ways to add some additional safeguards (both against malicious code and against insecure/buggy code).

In any event that’s definitely a someday-well-after-1.0.0 question – after all, maybe no one else will find _ useful, and I’ll be its only user. If so, being trusted won’t be an issue at all.

Conclusion: `_` and the Unix philosophy

I want to close with a few bigger picture thoughts about _ and its relationship to the values that (imo) contribute to well-designed software. One comment I got on the first post in this series was “I’m all for left-pad-sized packages”. I think that was meant as a point of disagreement, but my immediate internal response was “me too!”

I love left-pad-sized packages; if I didn’t, I’d hardly have written a seven of them for _’s initial release. This post has said something about “reducing the number of micro packages” so often that it’d be easy to forget, so I want to be perfectly clear: I think micro packages are great and that we should have more of them. If you’re considering writing a micro package, please do!

What I don’t like is having hundreds or thousands of dependencies. I especially don’t like having a reasonable number of direct dependencies but still having hundreds of transitive dependencies many layers deep. My goal with _ is to address that problem: I want to have my cake and eat it too.

I want each of my dependencies to be small and also to have as few transitive dependencies as possible. And I believe that _ (if successful) can help us all to get both. For example, if three of my dependencies need to wrap text and each uses a different micro package, then I’ve just picked up three new dependencies; but if they all use _, then I’ve only picked up one (or even zero, if one of them already used_). This sort of thing – where different dependencies import similar-but-different packages to perform the same task – happens all the time in many language ecosystems and is a major contributor to dependency bloat.

In all these ways, I hope that _ can flatten dependency trees and help us have both micro packages and fewer dependencies.

This dual goal of minimizing dependency size and dependency number also ties into a point that came up in an interesting and thoughtful set of reactions to my previous post. Paraphrasing a bit, the overall critique was that I’d misunderstood the Unix philosophy and that a correct understanding of that philosophy wouldn’t lead to massive dependency graphs or any of the other problems that I described as coming from following the Unix philosophy too far.

In some ways, this is a semantic disagreement: by “Unix philosophy”, do we mean the nuanced but not fully consistent set of practices that emerged in the early days of UNIX? Or do we mean the simplified version that most people mean when they refer to “the Unix philosophy” today? I’m not interested in debating the definitions of our terms but, to be clear, when I say that following “the Unix philosophy” leads to micro-package multiplicity, I’m using the phrase in its more contemporary, simplified sense – a.k.a., the way “many [people have] misapplied ‘The Unix Philosophy’ [as] justif[ying] ‘micro-packages’, when it really doesn’t”, according to at least one commenter (a friend and fellow Rakoon).

So it may well be that the True Unix Philosophy™ wouldn’t lead to programs with 1,000+ dependencies. I view _ as striking a balance between the Unix philosophy’s push towards micro packages and my simultaneous desire to keep my code’s dependency count in the double digits. But it’s fine if view the Unix philosophy differently and say that (correctly understood) it doesn’t encourage micro packages in the first place. From that point of view, _ could be about correctly (albeit still partially) applying the Unix philosophy by encouraging shallower and narrower dependency trees.

I’m happy with either framing; either way is a path towards simpler, more reliable, and more composable software. I hope that, by reducing abstraction, having code that’s “correct by inspection”, and providing a coordination point for small-but-useful sub-packages, _ can play its small part in making that happen in the Raku ecosystem. And, regardless of how well _ fares, I hope that other languages embrace the use of utility packages and the role they can play in reducing the depth and breadth of dependency trees.

Day 10 – Java Annotations in Raku or my @annotation is role;

Today, a little about the fact that the new is better absorbed through the already known. It so happened that I write for $dayjob in Java, so I will come from this side. Java 1.5 introduces an interesting syntactic form – annotations. It looks something like this:

/**
 * @deprecated use #getId() method instead
 */
@Override
public String getName() {
  return "stub";
}

The example shows an annotation @Deprecated that causes the runtime to print a warning to the console every time the getName method is used. In addition, explanatory information has been added to the Javadoc.

In general, annotations in Java are a mechanism for adding some metadata to classes, objects, types, etc., which can be used later at the stage of compilation, execution, or static analysis of the code. With the help of them, for example, it is possible to implement a code decoupling strategy – so that some program components work together with others, without having a rigid connection. This strategy builds on the idea of Inversion of Control and is the core of the Spring library.

But that’s enough Java. What is similar to the annotation engine in Raku? Raku has Traits, a syntax that can be used to mark classes and objects. These labels are processed during compilation of the program. Depending on the wishes of the programmer, the effect of such processing can have an impact on the course of program execution.

For example, consider a similar annotation to the @Deprecated construct from the Raku spec:

sub get-name(--> Str) is DEPRECATED('get-id() method') {
  'stub'
}

is DEPRECATED is a trait. The argument to this trait can provide an alternative to the deprecated code. After the program finishes, during the execution of which the get-name function was called, a message will be displayed indicating where and how many times the obsolete code was executed:

Saw 1 occurrence of deprecated code.
======================================================================
Sub get-name (from GLOBAL) seen at:
  ~ / advent.raku, line 13
Please use get-id() method instead.
----------------------------------------------------------------------
Please contact the author to have these occurrences of deprecated code adapted, so that this message will disappear!

Obsolete

is DEPRECATED is a trait from the standard library. To understand how it works, let’s try to write our analogue under the name obsolete. First, let’s define the storage of the collected information – a class that stores and updates the number of function calls and is able to display a report:

class ObsoleteTraitData {
  has $.routine-name is required;
  has $.user-hint;
  has $!execution-amount = 0;
  method executed() { $!execution-amount++ }
  method report() {
    return unless $!execution-amount;
    note "Obsolete routine $!routine-name is executed $!execution-amount times.";
    note $_ with $!user-hint;
  }
}

Now we declare a test trait – this is an ordinary multifunction with a name trait_mod:<is> and two arguments: the first is what the trait will be applied to (in our case, this is a Routine), the second is the name:

say 'run-time';
multi trait_mod:<is>(Routine $r, :$obsolete!) {
  say 'compile-time'
}
sub get-name(--> Str) is obsolete {
  'stub'
}
say get-name;
# Output: compil-time
#         run-time
#         stub

The most important thing to understand about traits is that their functions are executed at compile time, not at program execution. This can be clearly seen from the output of the code above. Let’s remember what we want to achieve – a report on the execution of obsolete code before the program terminates. We can obtain this information only during the execution. To affect compile-time execution, the trait must modify the function in some way. In our case, you can add via the function phaser ENTER. This is a special block that is executed before the first statement of the function is executed. That is, we make the function get-name looks something like this:

sub get-name(--> Str) {
  ENTER { $obsolete-trait-data.executed }
  'stub'
}

We cannot touch the code of the function itself, but we can do the necessary manipulations during compilation. We take the function name, a possible hint for the user, create a new type object ObsoleteTraitData, put it in the local associative variable %obsolete-trait-data and add the necessary phaser:

my ObsoleteTraitData %obsolete-trait-data;

multi trait_mod:<is>(Routine $r, :$obsolete!) {
  my $routine-name = $r.name;
  my $user-hint = $obsolete ~~ Str ?? $obsolete !! Any;
  %obsolete-trait-data{$routine-name} =
    ObsoleteTraitData.new(:$routine-name, :$user-hint);
  $r.add_phaser('ENTER', -> {
    %obsolete-trait-data{$routine-name}.executed;
  });
}

Now, when the function get-name is executed, the ObsoleteTraitData object will update its state. Thus, we influenced the program execution flow during compilation. It remains only to display the report. To do this, we will add another phaser END to the main code. Its block is executed just before the end of the program. Thus, we get the following picture:

class ObsoleteTraitData { #`(described above) }

my ObsoleteTraitData %obsolete-trait-data;

END { .report for %obsolete-trait-data.values }

multi trait_mod:<is>(Routine $r, :$obsolete!) { #`(described above) }

sub get-name(--> Str) is obsolete('Please use get-id() instead.') {
  'stub'
}
sub another-obsolete() is obsolete {}

get-name();
another-obsolete();
get-name();

# Output:
# Obsolete routine get-name is executed 2 times.
# Please use get-id() instead.
# Obsolete routine another-obsolete is executed 1 times.

Override

Another commonly used annotation in Java is @Override on a class method. The case where it does not override a super-class method is considered a compilation error. It will not be difficult to make a similar trait – we will not have to go beyond the compilation stage. We declare a trait with a name override that applies only to methods:

multi trait_mod:<is>(Method $m, :$override!) {

We check that the method is a member of the class, otherwise we exit:

  return unless $m.package.HOW ~~ Metamodel::ClassHOW;

We check that the class of the owner of the method has parents. To do this, we will use the meta-method ^mro, which will return a list of all parent classes, including itself, Any and Mu (we will filter them from consideration):

  my $class = $m.package;
  my $method-point = $class.^name ~ '::' ~ $m.name;
  my @parents = $class.^mro[1 ..^ *-2];
  die "is override trait cannot be used without parent class $method-point." unless @parents;

We go through all the parents and their methods in search of one that matches in name and signature. Comparing method signatures is not a very trivial task, and here we will hide its implementation behind a function check-signature-eq:

  for @parents -> $parent {
    for $parent.^methods -> $parent-method {
      return if $parent-method.name eq $m.name &&
        check-signature-eq($parent-method.signature, $m.signature)
    }
  }

If the parents did not find the required method, they will return an error:

  die "$method-point does not override any parent methods.";
}

As a result, we get the following:

multi trait_mod:<is>(Method $m, :$override!) { #`(described above) }

class A {
  method from-a(:$r) {}
}

class B is A {
  method from-a($r) is override { # missed a colon
    say 'from-b'
  }
}

# Output: B::from-a does not override any parent methods.
# Exit code: 1

Suppress

We have already managed to implement the logic of the Java annotations @Deprecated and @Override. Let’s try to implement the logic of @SuppressWarnings. This annotation applies to the function and suppresses its warning messages. Also, you can specify which warnings will be suppressed.

In Raku, warnings can be displayed using a function warn. It throws a special exception, which is printed to the error stream, and the execution process resumes where it was. You can catch such an exception using a special phaser CONTROL. That is, as in the case with @Deprecated, we need to modify the function by adding the desired phaser. Let’s try something new and use the function wrapper instead of add_phaser. How does it work? We are replacing one function with another that can call the original (by the routine callsame) at its discretion . Inside this function, we will insert a phaser CONTROL, which will mimic the standard behavior, but not for suppressed warnings:

multi trait_mod:<is>(Routine $b, :$suppress-warnings) {
  my $regex = $suppress-warnings ~~ Str
    ?? / <$suppress-warnings> /
    !! Any;
  $b.wrap(sub with-control(|c) {
    callsame;
    CONTROL {
      when CX::Warn {
        .note if $regex.defined && $_.message !~~ $regex;
        .resume
      }
    }
  });
}

sub work-in-progress() is suppress-warnings('todo') {
  warn 'important warn';
  warn 'todo warn';
}

work-in-progress()
# Output:
# important warn
#   in sub work-in-progress at ~/trait-supress.raku line 15

Serialize

All that remains is to discuss user-defined annotations. As I said above, Java annotations are a way to attach some meta information to a class or object. Thereafter, at compile time, or more often at runtime, the annotated objects are checked to see if they have the information they need. In Raku, roles are great for this. Consider the problem of adding the simplest serialization system to a class. Let’s write a class and mark it up our future trait:

class Person is serialize-name('Passport') {
  has $.first;
  has $.second is serialize-name('Second name');
  has $.third is serialize-name('Honorific');
}

You can see that trait serialize-name applies to both the class itself and its attributes.

The trait for the attribute looks like this:

role SerializableAttribute {
  has $.serialize-name;
}

multi trait_mod:<is>(Attribute $a, :$serialize-name!) {
  $a does SerializableAttribute(:$serialize-name);
}

Above, the trait adds a new SerializableAttribute role to the attribute. This role itself injects a new attribute into the attribute 🙂 The value of the new trait attribute is passed through its argument.

The trait for the class looks like this:

role SerializableClass[$name] {
  method serialize() {
    say $name, '| ', self.^name;
    say .serialize-name, '<-', .get_value(self)
      for self.^attributes(:local) .grep(*.^can('serialize-name'));
  }
}

multi trait_mod:<is>(Mu:U $c, :$serialize-name!) {
  return unless $c.HOW ~~ Metamodel::ClassHOW;
  $c.^add_role(SerializableClass[$serialize-name]);
}

Above, you can see that trait checks that it applies exactly to the class and adds a special role SerializableClass. This role adds a new method serialize to the class that implements all the serialization logic. In particular, it filters the list of all class attributes based on the presence of a method serialize-name.

If we run all this, we get:

Person.new(:first<John>, :second<Hancock>, :third<Mr>).serialize();
# Output:
# Passport | Person
# Second name <- Hancock
# Honorific <- Mr

Conclusion

As we can see, traits are a pretty powerful tool, but like everything in the Raku world, it can be used in very different ways. For example, in Java, when declaring their annotations, the programmer must indicate to what stage its action extends (only at the code level, until the end of compilation, or until the end of the application). You can also specify whether the annotation will be inherited by child classes, and whether it can be specified multiple times. On the other hand, traits in Raku give the programmer complete freedom of action. You now have the knowledge to write your own Inversion of Control/Dependency Injection system like Java Spring Core using Raku traits.

Day 9 – Raku code coverage

Although I love using Raku, the fact that it is still a relatively young language means that there is a fair amount that is lacking when it comes to tooling, etc. Until recently, this included a way to calculate code coverage: how much of the code in a library is exercised (=covered) by that library’s test suite.

Now, truth be told, this feature has been available for some time in the Comma IDE. But this (together with other arguably essential developer tools like profiling, etc) is only available in the “Complete” edition, which requires a paid subscription.

Still, I knew that the Raku compiler kept track of covered lines, so I always felt like this should be doable. It only needed someone to actually do it… and it looks like someone actually did.

So, consider my surprise when, while recently browsing raku.land, I came across App::RaCoCo, which claims to be ‘a Raku Code Coverage tool’. Sweet!

Let’s see how it works.

Running locally

The library ships with a racoco executable, which is what we’ll use to calculate the coverage. The first couple of times I ran it I got some scary output because it could not find the library to test, but after reading the documentation, and trying a couple of things out, I managed to find a right set of options for me.

Let’s see it in action on my very own HTTP::Tiny:

$ racoco --exec='prove6 -l' --html
t/agent.t ......... ok
t/errors.t ........ ok
t/mirror.t ........ ok
t/online-async.t .. skipped
t/online-basic.t .. skipped
t/requests.t ...... ok
t/responses.t ..... ok
t/url-parsing.t ... ok
All tests successful.
Files=8, Tests=52,  9 wallclock secs
Result: PASS
Visualisation: file:///home/user/HTTP-Tiny/.racoco/report.html
Coverage: 81%

Success! We can run our test suite, with the development version, and we get a nice little summary at the bottom. Thanks to the --html option we even generated an HTML report we can examine in the browser, with line-by-line details on what was covered.

The tool is still young, and there are still quirks that should be ironed out. I’d expect the friction with the --exec flag to be one of those. But until then, we have a working tool we can use. Huzzah!

So we can run the tool locally, which is great. But can we run it on code that is hosted remotely? And how do we publish those results?

With a lot of my distributions, what I’ll do is send coverage output to Coveralls, which keeps track of it and renders it publicly, which is great.

However, racoco does not ship with a Coveralls exporter, and currently has no way to plug in custom reporters (like, say, the cover tool used in Perl). This feature is in development, but until then, we’ll need an alternative.

Running on GitLab

Since most of my Raku distributions are hosted on GitLab, that’s what I’ll be demonstrating, but a lot of these steps are likely the same or similar in other popular CI platforms.

The CI configuration I’ll be using will look something like this:

# In your .gitlab-ci.yml
coverage:
  image: rakuland/raku:latest
  before_script:
    - zef install --/test --deps-only --test-depends .
    - zef install --/test App::RaCoCo
  script:
    - racoco --exec='prove6 -Ilib' --html
    - mv .racoco public
    - find public -type f -not -name "*.html" -delete
  artifacts:
    paths:
      - public
    public: true

This defines a “coverage” job which will run in an environment where we install the dependencies of the library we are testing, as well as the App::RaCoCo distribution itself. We then use racoco to generate the report, and we make sure all the HTML files from the report are in the public directory, which we can then expose as a public artifact.

This means we can then view these in the browser via a link like this one.

But we can go one step further. Even though we cannot (yet) easily talk to external tools like Coveralls, we can still make use of the Gitlab features to put this link in a badge that nicely displays our coverage percentage.

For that, we’ll have to set a coverage parsing regex, which Gitlab will use to parse the coverage percentage from the CI job output. In this case, to work with the racoco output, we can use Coverage: \d+.\d+%.

The value that is parsed will then be available as a badge that can be set in the “General” project settings.

The fields in that section take placeholders, which means that these values should work for whatever project we are configuring. We can use this one for the path to the coverage badge:

https://gitlab.com/%{project_path}/badges/%{default_branch}/coverage.svg

And this one if we want to link to the published artifacts of the latest coverage job (do note that in this case we are referring to the job by name, so if you’ve used a different name you’ll have to update it):

https://gitlab.com/%{project_path}/-/jobs/artifacts/%{default_branch}/file/public/report.html?job=coverage

If all went well, the badge will display in the main page of your project as shown in the image at the top of this post. This will happen automatically (=you don’t have to manually add them to the readme, for example), and any badges you add will link to wherever you pointed them to.

Room to grow

As noted above, racoco is still young and there are are still some rough edges. One in particular is that running the tool multiple times on the same test suite will sometimes generate slightly different results, and that some lines might either not be picked up as coverable or covered, even though they are. Some of this is due to this being a new tool, and some of it is due to the way Rakudo reports this data in the first place. In either case, these should be issues we can fix.

Despite the rough edges, the tool has already proved useful to me, and it’s become a part of my regular setup.

The future is bright, and there’s room to grow.

Day 8 – Practice… on Advent of Code

“Hrmpf!” mutter mutter mutter “Bah!”

The head elf Fooby Nimblecalmy was trying to to read an interesting article on Ramsey Theory, but was having a hard time because the latest addition in Santa’s IT Operations Buzz Bargoosey was steaming like a kettle.

Anyway, Fooby was determined to go through the article, so decided to deliberately ignore Buzz.

“Grump! Moan… moan… moan…”

It wasn’t going to end any soon and Ramsey Theory definitely requires attention, so Fooby decided to bite the bullet and ask:

“Well… what’s up Buzz?”

“Uh, oh… sorry Sir Nimblecalmy, nothing Sir…”

“I told you not to call me Sir, and you’re not going to end any soon… so again, what’s up Buzz?”

“Well Si…AHEM Fooby, the fact is that I’m bored to death! There’s nothing to do here!”

Fooby looked at Buzz from over the glasses and noticed a shockingly resemblance to… a younger Fooby, too many years ago. Only that, at the time, there were a lot of automation tasks and there was this new shiny programming language, Perl…

Buzz had a point though. They basically implemented all the implementable so far, so these days it was all maintenance every now and then. And not all the other elves were so much into mathematics.

Even though…

“You know… we will soon have to upgrade a few programs here, to take advantage of the more recent multi-core processors. I heard Raku is perfect for going parallel without too much hassle!”

“Oh, Raku yes… it would be great!”

Buzz’s face was bright and dreamy again, so Fooby was ready to delve again into Ramsey Theory. A tad too fast, though, because Buzz sighted in a loud and clear way.

“What’s up again, son?”

“Well… I know a little about Raku, but I need to exercise a lot and I don’t know what to do about it!”

Now it was Fooby’s time to sigh. Ramsey Theory was quickly fading at the horizon… when light came, suddenly!

“Why don’t you try to solve a couple of puzzles a day, say up to Christmas day? You might start with something simple, and increase complexity as days go…”

“This would be brilliant Sir! Yes!”

“So… first of all don’t call me Sir, then head over to Advent of Code and start from the beginning! Each day you will be facing a puzzle, and another one will be available after you solve the first one.”

“OK yes… but how will this help me to learn Raku?!?”

“Well, just by solving the problems you will need to use the language and understand how things can be done in at least one way. Then… after you have solved the daily puzzles, or if you’re stuck, you can head to the Solution Megathreads in Reddit and look for other Raku solutions… there’s a lot of clever people there, although still a tad too few.”

Buzz was thrilled by the idea, but still unsure about it. So Fooby decided to show an example, just to get Buzz started.

“Let’s take day 1 as an example. In part 1, we have a list of numbers in a file, and we have to find out how many times a number is greater than the one immediately before”

“Oh, I know I know! This is how to do it” and started writing on the terminal:

my @numbers = $filename.IO.lines;
my $count = 0;
for 1 .. @numbers.end -> $i {
    ++$count if $numbers[$i - 1] < $numbers[$i];
}
put $count;

Fooby noticed that Buzz already had some confidence with Raku, but also that there was definitely space for improvement.

“Well, that’s a good start for sure! Now you can take a look at seaker’s solution for fun and learning” and showed Buzz this:

#!/bin/env raku

sub MAIN(Str:D $f where *.IO.e = 'input.txt') {
    my @n = $f.IO.words;
    put 'part 1: ', [+] @n Z< @n[1..*];
    put 'part 2: ', [+] @n Z< @n[3..*];
}

“You see? seaker is being more precise by using .words instead of .lines, which improves readability.”

“OK but… what’s with that way of calculating the result for part 1?!?”

“Oh, that’s part of the Christmas Magic! I mean, Raku‘s Magic. First, let’s consider the zip metaoperator Z:”

@n Z< @n[1..*]

“It takes one element from the left and one from the right, and applies the comparison operator < to the pair”.

“OK but… what’s with @n[1..*] on the right?” asked Buzz.

“Well, that’s a lazy indexing of the array @n. Raku takes elements as long as they are needed, and nothing more. So here the @n on the left will basically determine how many elements to take. The right hand side might go beyond the end of the array, but it’s not a problem in this case because they will not yield a True value in the comparison”.

“OK, so we’re left with an array of booleans, right?”

“Right. This is where the [+] comes into play. It’s a sum operator +, wrapped into a reduction metaoperator“.

“Oh I know, I know! The Red Auction is when we offer Christmas sweeties to get Santa’s hat on December 26th, right?”

Fooby took off the glasses and massaged a bit the bridge over the nose. Ramsey Theory had never been farther…

“No… not the da…rling Red Auction, but reduction. It’s an operation on a list of values, that takes the first two items and applies an operator to get a result. Then it takes the result and the next item, applies the operation again, and so on. Reduction, because it reduces a list down to a single item. In this case, the operation is the sum and the result is the sum of all values.”

“OK, I get it. But wait! We’re summing booleans here… it that allowed?”

“Good catch!” replied Fooby. “Actually yes, because when a boolean is used as a number, it takes a value that is either 0 for False and 1 for True, which is exactly what we need here, because we’re just counting how many True values we have”.

“Right, I see… in the same spirit of Perl, I daresay. What’s with that signature? That is different from Perl!”

sub MAIN(Str:D $f where *.IO.e = 'input.txt') { ...

“Yes, it is indeed. It’s all for a single input parameter $f, actually. The Str part tells us that we’re expecting a string…”

“Oh!” interrupted Buzz. “Does this mean that we have to assign a type to each variable and parameter? Why don’t we do anything to @n? Which type…”. The flood gates were open!

“Hold on! Hold on a second” interrupted Fooby. “Raku has something called gradual typing, in that you assign a type to a variable only if you think it’s useful for you. In this case, the author thought it was useful and set it.”

“Uh, well, sorry Sir… please go on, what’s with the smiley?”

“Please… don’t call me Sir. That’s not a smiley, it’s a type constraint that asks Raku to check that the input is defined.”

“Anyway” intervened Buzz “I’m always happy when things are defined, so it’s still a smiley for me! Now that where…”

“That is an additional constraint on the input” answered Fooby. “The star is a placeholder for $f itself, and the expression asks Raku to check that the input string, when considered a file name, should refer to a file that actually exists in the filesystem.”

“Brilliant!” observed Buzz. “Then I think that the equal to input.txt part sets a default value, right?”

“Precisely!” agreed Fooby.

“Uhm I think I got it… Fine! I’ll look into the other part of the solution, and try some puzzles myself. Thanks a lot Sir!”

Fooby was about to complain about being called Sir, then decided it was not the case and, at last, delved back to that interesting article on Ramsey Theory…

Day 7 – Neural Nets in Raku (Part 1)

Thinky the Elf was sitting in his office, it had been a closet but he’d been given it as his office after the great baked beans incident. It wasn’t his fault. He was right that feeding the reindeer beans would give them a jet boost but Santa had not been all that happy about it. And his tendency to stare of into space while suddenly having a thought wasn’t great while working on the shop floor meant it was safer to put him out of the way to do some thinking.

Recently he’d been thinking about how to sort children into naughty or nice. This was Santa’s big job all year and Thinky thought that there must be a way to simplify it, he’d spent some time watching videos on YouTube and there was one that gave a brilliant description of Neural Networks (jump to 20 minutes for that bit but it’s an interesting video). As Thinky watched this he couldn’t help thinking about Raku and how the connections between nodes felt like Supplies.

With this he dived in and played about to try and build Neural Networks with Raku and Supplies, he tried a few things and got to a system that worked but it had a few drawbacks.

Firstly we start with a Neuron role. The Neuron might be an input, and intermediate grouping one or a final output one but they have some shared functionality.

role Neuron {
    has %!input-vals = {};
    has Supply $!die;
    has Supply $!input;
    has Str $.id is required;
    has Str $.gene;
    has Bool $.scream;
    has Promise $.watch;

    submethod BUILD( :$!die, :$!id, :$!gene = '', :$!scream = False ) {}

%input-vals stores the inputs this Neuron has received. The $!die Supply will receive a message when the Neuron (and its containing Brain) is to stop whilst the $!input Supply takes in all the input data. Each Neuron has a unique id and also knows the gene used to create the Brain it is found in. The scream Boolean will cause it to emit tracking info via notes.

Then there are a few methods :

    method !process-inputs() {...}

process-inputs is a placeholder for the concrete Neuron classes to handle what to do with incoming data.

    method start() {
        my $alive = True;
        if ( ! $!input.defined ) {
            return;
        }
        return start react {
            whenever $!die -> $ {
                note "{$!gene} : {$!id} : DIE" if $!scream;
                $alive = False;
                done();
            }
            whenever $!input -> ($id, $v) {
                note "{$!gene} : {$!id} : {$id} : {$v} : {$alive}" if $!scream;
                %!input-vals{$id} = $v;
                self!process-inputs();
                done() unless $alive;
            }
        };
    }

start brings a Neuron to life and returns a Promise (or if the Neuron isn’t wired up with inputs nothing). The Neuron watches its two supplies and fires off process-inputs after updating the %!input-vals hash. If it receives a trigger on the $!die supply it shuts itself down and tells any other processes will running to do so too by setting the internal alive Boolean.

    method attach-input(Supply $s) {
        if ( ! $!input ) {
            $!input = Supply.merge( $s );
        } else {
            $!input = $!input.merge($s);
        }
    }
}

attach-input uses the merge method to combine all the inputs passed into a Neuron into one single one that the start method watches.

Two of the Neurons, the Input and Group, can have multiple outputs so we’ll make a Role for them.

role PassThruNeuron does Neuron {
    has @.outputs;

    method attach-output( Supplier $out ) {
        @!outputs.push( $out );
    }
}

Then we defined the Input and Group Neurons as PassThrus.

class InputNeuron does PassThruNeuron {
    method !process-inputs() {
        if ( %!input-vals{$!id}.defined ) {
            .emit( ( $!id, %!input-vals{$!id} ) ) for @!outputs;
        }
    }
}

The Input neuron filters any input data it’s received for its id only and sends this onto its outputs. This allows us to have one shared Input stream that inputs can pull from.

class TanHGroupNeuron does PassThruNeuron {
    has Rat $!threshold = 0.1;
    has $!previous;

    method !process-inputs() {
        .emit( ( $!id, tanh( [+] %!input-vals.values ).round($!threshold) ) ) for @!outputs;
    }
}

The TanHGroupNeuron (called as such to allow for multiple type of Group Neurons) computes the tanh value of the sum of its inputs and sends these out.

And then we have the Output Neuron, it’s only got one output value so it’s pretty simple.

class OutputNeuron does Neuron {
    has Num $!output;
    has Rat $!threshold = 0.1;

    method !process-inputs() {
        $!output = tanh( [+] %!input-vals.values );
    }

    method output() {
        $!output.defined ?? $!output.round($!threshold) !! 0;
    }
}

Once again we’re rounding the value output and if there isn’t a value set we return a 0. Note that we have $!threshold values to manage rounding. These are currently only set at the defaults but it’s there for the future.

With the Neuron built we turn to look at the paths between them, these can be defined as a start point (an input or group neuron) and an end point (a group or an output neuron) and a weight which the value being sent is multiplied by.

class PathSpec {
    has Str $.input;
    has Str $.output;
    has Rat() $.weight;
    method Str() { "{$.input}:{$.weight}:{$.output}" }
    method gist() { "{$.input} ==x{$.weight}==> {$.output}" }
    method COERCE( Str:D $str --> PathSpec:D ) {
        my ( $input, $weight, $output ) = $str.split(":");
        PathSpec.new( :$input, :$output, :$weight );
    }
}

The PathSpec class covers all this including the ability to transform them to or from Strings.

Finally we have the Brain (collection of Neurons and Paths).

class Brain {
    my $killScheduler;
    my $pathScheduler;
    
    has Supplier $.inputStream;
    has Supplier $!killStream;
    has @.watch-list;
    has @.outputs;

A Brain has an $.inputStream with all the input data and the $!killStream that will be connected to the $!die inputs on each Neuron in the Brain. The @.watch-list and @.outputs arrays contains the Promises from each start method and the combined output values from each OutputNeuron. We also defined 2 class level attributes, schedulers that will be shared between all the running brains. This is to stop the default scheduler from being overwhelmed.

    method kill() {
       $!killStream.emit(True).done();
   }

   submethod BUILD( :$!inputStream, :$!killStream, :@!watch-list, :@!outputs ) {}

   submethod DESTROY { self.kill(); }

A few methods to help building an tearing down brains and then we move to the make method. You can either give it a gene string a list of PathSpec strings joined by commas or a list of PathSpec Objects.

    multi method make( Brain:U: Str :$gene!, :$inputStream, Bool :$scream) {
        my @paths = $gene.split(",").map( -> $g { my PathSpec(Str) $p = $g; $p });
        return Brain.make( :@paths, :$inputStream, :$scream );
    }

    multi method make( Brain:U: :@paths! is copy, :$inputStream = Supplier::Preserving.new(), Bool :$scream ) {
        my (@inputs, @outputs, @groups);

        my $gene = @paths.join(",");

        $pathScheduler //= ThreadPoolScheduler.new();
        $killScheduler //= ThreadPoolScheduler.new();
        my $killStream = Supplier.new();
        my $killSupply = $killStream.Supply();
        $killSupply.schedule-on($killScheduler);

Create the kill and path schedulers if they don’t already exist, and an internal killstream that will be assigned to the private variable and the killSupply to pass to the Neurons.

    my @combined;
    repeat {
        my $ps = @paths.shift;
        for (@paths) -> $check is rw {
            if ( $ps.input ~~ $check.input && $ps.output ~~ $check.output ) {
                $check = PathSpec.new(
                    :input($ps.input),
                    :output($ps.output),
                    :weight($ps.weight + $check.weight)
                );
                $ps = Nil;
                last;
            }
        }
        @combined.push($ps) if $ps.defined;
    } while @paths;

    @paths = @combined;

With randomly generated genes we may end up with multiple connections between Neurons, this code combines them into single paths.

        for (@paths) -> $p {
            given $p.input {
                when m/^i/ { @inputs.push($_) }
                when m/^g/ { @groups.push($_) }
            }
            given $p.output {
                when m/^g/ { @groups.push($_) }
                when m/^o/ { @outputs.push($_) }
            }
        }
        @inputs .= unique;
        @outputs .= unique;
        @groups .= unique;

        my $inputSupply = $inputStream.Supply();

        my %nodes;
        my @final-outputs;
        for ( @inputs ) -> $id {
            %nodes{$id} = InputNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
            %nodes{$id}.attach-input($inputSupply);
        }
        for ( @outputs ) -> $id {
            %nodes{$id} = OutputNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
            @final-outputs.push( %nodes{$id} );
        }
        for ( @groups ) -> $id {
            %nodes{$id} = TanHGroupNeuron.new( :$gene, :$id, :die($killSupply), :$scream );
        }
        for ( @paths ) -> $ps {
            my $path = Supplier.new();
            %nodes{$ps.input}.attach-output($path);
            %nodes{$ps.output}.attach-input($path
                                            .Supply
                                            .map( -> ($i,$v) { ($i, $v * $ps.weight) })
                                            .throttle(1, 0.5)
                                            .schedule-on($pathScheduler)
                                           );
        }
        my @watch-list = %nodes.values.map( *.start() ).grep( *.defined ).list;

        return Brain.new( :$inputStream, :@watch-list, outputs => @final-outputs, :$killStream );
    }
}

Then we create our Neurons and join them up based on the paths passed in. The paths are created using map for the weighting and some throttling to control feedback loops. Finally we pass these to the pathScheduler to ensure kill messages and other async process can still run safely.

Thinky was really happy with this system, for 1000 brains his 16 core machine handled just fine. With 5000 things got a bit crazy but still ran and generally he didn’t think he’d need that much. All in all he was really happy with them… now he just had to remember why he’d made them. And work out how to train them and mutate them. But that was a job for another day (hopefully this advent calendar but this is very much a work in process).

I (I mean Thinky) hopes to get some of this released as a module soon if people are interested.

Here’s some example usage, creating 1000 brains that share and input stream, passing in some inputs, getting the outputs and doing it again before shutting everything down.

my @ins = qw<i1 i2>;
my @gs = qw<g1 g2 g3>;
my @os = qw<o1 o2>;

my @paths;
my @is;
for (1..1000) {
    my @p;
    for (1..4) {
        @p.push( (
                    (|@ins,|@gs).pick,
                    (-40..40).pick / 10,
                    (|@os,|@gs).pick,
                ).join(":") );
    }
    @paths.push( @p );
}

my $inputStream = Supplier::Preserving.new();

@paths = @paths.map( -> @p {       
        my $gene = @p.join(",");
        note $gene;
        {
            gene => $gene,
            brain => Brain.make( :$inputStream, :$gene ),
        }
    }
);
note "Made brains";

sleep(1);

note "Emit i1 : 0";
$inputStream.emit(('i1',0,));
note "Emit i2 : 1";
$inputStream.emit(('i2',1,));

sleep(0.1);
note "Output?";
for ( @paths ) -> %p {
    for ( %p<brain>.outputs ) -> $o {
        say( "{%p<gene>} : {$o.id} : {$o.output // 0}");
    }
}

sleep(0.1);
note "Emit i1 : 1";
start $inputStream.emit(('i1',1,));
note "Emit i2 : 0";
start $inputStream.emit(('i2',0,));


sleep(0.1);

note "Killing Brains";
.<brain>.kill for @paths;
note "Closing Stream";
.done() for @is;
note "Awaiting the end";
await | @paths.map( *<brain>.watch-list );
note "All done";
note "Output?";
for ( @paths ) -> %p {
    for ( %p<brain>.outputs ) -> $o {
        say( "{%p<gene>} : {$o.id} : {$o.output // 0}");
    }
}

Day 6 – Following the Unix philosophy without getting left-pad

The Unix philosophy famously holds that you should write software that “does one thing, and does it well”. There are other tenets as well, but I’m focusing on the core idea expressed in Programming Design in the UNIX Environment:

Whenever one needs a way to perform a new function, one faces the choice of whether to add a new option or write a new program…. The guiding principle for making the choice should be that each program does one thing.

For instance, if you’re writing a program that produces text in one format, don’t also have it print the text in eight alternative formats. Instead, leave that task for a different specialized program that can process your program’s output. Or, put differently, fight against your program’s inherent tendency to “attempt to expand until it can read mail” (Zawinski’s law).

Of course, you don’t want to follow the Unix philosophy off a cliff, and programmers have been arguing about exactly where to draw the line since well before Rob Pike complained that “cat(1) came back from Berkeley waiving flags” 40 years ago. Nevertheless, the do-one-thing-and-do-it-well approach is well worth aiming for.

In the context of writing libraries, the Unix philosophy encourages the practice of writing micro-packages: small libraries, intentionally limited in scope, that serve exactly one purpose. Some programming language communities have this as an explicit goal; for example, one of the leading Node.js developers explicitly invoked the Unix philosophy in their advice to programmers:

Write modules that do one thing well. Write a new module rather than complicate an old one.

This practice of writing micro packages contrasts sharply with the practice of writing omnibus packages that attempt to provide a single, coherent API that aims to solve any problem a developer might encounter. And micro packages benefit from all the advantages that have made the Unix philosophy such good advice for 50 years. Most notably, micro packages tend to be simple enough (and small enough) that you can personally inspect the code – and, if necessary, debug any issues that come up.

The downside of micro packages

As this post’s title probably gave away, the problem with overusing micro packages is that it can lead to what happened with left-pad. Without rehashing all the details, there was an 11-line JavaScript package (left-pad) that did nothing other than pad each line of a string with a specified amount of whitespace. Yet, somehow, a huge percentage of the JavaScript ecosystem depended on this simple function – either directly or more commonly indirectly. As a result, when the developer removed the package (in a way that couldn’t happen anymore for reasons not relevant here), that same fraction of the JavaScript ecosystem fell over. I’m not sure exactly how many builds failed, but one source estimated that over 2.4 million software builds depended on left-pad every month. So not a small number.

In other words, someone finally pulled out the one domino that the entire Internet depended on:

And while left-pad may be an extreme example, the direct consequence of JavaScript’s embrace of the Unix philosophy is that JavaScript programs commonly depend on huge numbers of micro packages.

A 2020 study found that the typical JavaScript program depends on 377 packages (here, “typical” means “at the geometric mean”, which reduces the impact of outliers). And a full 10% depend on over 1,400 third-party libraries. Many of these dependencies are admirably tiny: one of the most depended-on packages (used by 86% of JavaScript packages – literally tens of millions of developers) is essentially just one line of code. It’s hard to take “do just one thing” to any greater extreme.

And yet.

And yet, I don’t believe that any developer can reasonably comprehend a system made up of hundreds (thousands?) of independent packages. It’s not just a matter of the total lines of code climbing to incomprehensible levels (though that famously happens and certainly doesn’t help). But even if the total lines of code were manageable, the interaction effects simply aren’t – remember, these packages weren’t designed to form a coherent whole, so they can and will make inconsistent assumptions or create inconsistent effects.

The many different problems that can arise from this abundance of micro packages leads some people to conclude that you should kill your dependencies. Or, as Joel Spolsky put it:

“Find the dependencies — and eliminate them.” When you’re working on a really, really good team with great programmers, everybody else’s code, frankly, is bug-infested garbage, and nobody else knows how to ship on time. When you’re a cordon bleu chef and you need fresh lavender, you grow it yourself instead of buying it in the farmers’ market, because sometimes they don’t have fresh lavender or they have old lavender which they pass off as fresh.
…
This principle, unfortunately, seems to be directly in conflict with the ideal of “code reuse good — reinventing wheel bad.”

A wild dilemma appears

At this point, I hope the tension is pretty clear: on the one hand, it’s great to keep components small, simple, and composable. On the other hand, it’s terrible to bury yourself in a tangle of different packages, no matter how tiny they are. The Unix philosophy and killing your dependencies pull in opposite directions.

Of course, this is hardly a new insight. It’s a point many people have made over the years; I particularly enjoyed how Rust-evangelist extraordinaire Steve Klabnik put it a couple of years ago:

But I want to do more than note the tension: I want to provide a solution (or at least an outline of what I view the solution to be). Before I do so, however, I need to mention a few non-solutions that I reject.

First, I don’t think that we should resolve this dilemma by fully choosing one side or the other. Like Russ Cox, I acknowledge that installing a dependency entails allowing your “program’s execution [to] literally depend[] on code downloaded from [some] stranger on the Internet”; I don’t believe that doing so thousands of times will ever be a recipe for crafting robust software. As much wisdom as there is in the Unix philosophy, it simply won’t do to accept it 100% and embrace the micro-package dystopia.

At the same time, I also cannot fully embrace the “kill your dependencies” extreme. While it would be appealing to live in an ideal world where, like one developer I admire, “I [could] list the entire dependency graph, including transitive dependencies, off of the top of my head”, I don’t believe that’s a tenable solution. For one thing, the code reuse and code sharing that micro packages enable is a huge part of what gives open source and free software developers superpowers: If a project can only be done by a team of dozens, it will almost certainly be built by a for-profit company. But if relying on existing packages lets one or two hackers, working alone, create that software – well, then, there’s an excellent chance that we’ll have a free software version of the program. (Remember, mega-projects like Linux are very much the exception, not the rule – the median number of maintainers for free software projects is 1, as I’ve discussed at length elsewhere.)

Even setting aside the practical benefits of code reuse, I still wouldn’t agree that we should jettison micro packages. The inconvenient reality is that the Unix philosophy is just plain correct: for any given volume of code/features, it’ll be easier to reason about the system if it’s composed of many small, independent modules instead of being one massive blob. Killing our dependencies and replacing that code with our own implementation would, in many cases, just make a bad situation worse. So I reject the idea that we can “solve” this problem by picking one extreme or the other.

But I also view a naive compromise between the extremes to be a non-solution. Both extremes have real problems, but that doesn’t provide any guarantee that splitting the baby will be any better. Indeed, there’s a real risk that it’ll be worse: if you take a program that depends on 500 micro packages and re-architect it to instead depend on 200 larger packages, then you still have far too many packages to manually review and maintain. But now you are also dealing with packages that are each harder to understand when you do need to start debugging. Nice job breaking it, hero.

A less naive compromise

Having just rejected both extremes and a simple compromise, it’s clearly on me to come up with a better way to strike this balance. What we need is a way to limit the number of dependencies for any given software project without leading to a corresponding increase in the average size of each dependency. I have some ideas about how we can do so at the programming language level. (I’m going to discuss this in the context of my programming language of choice, Raku, but I believe these prescriptions to be more broadly relevant.)

I believe that a programming language/community can balance the Unix philosophy and dependency minimization by following three steps. In order from most to least fundamental, the programming language should:

maximize the language’s expressiveness;
have a great standard library; and
embrace a utility package (or a few utility packages).

I’ll discuss each of these in turn and then conclude with a few thoughts about more individual actions we can all take to protect our own code.

Maximize expressiveness

At first blush, “language expressiveness” may seem like an odd place to start. If the goal is to write libraries that “do just one thing”, then it seems like the number of words that it takes to implement that “one thing” shouldn’t really matter.

But what this ignores is that “one thing” is not well defined. Consider the output from the ls command.

Pretty much every Unix-based OS currently takes the view that ls is best thought of as “one thing”. But in that original Unix Environment paper, Pike and Kernighan argued that it is really two things: (1) listing the files and (2) formatting the output into columns. But I could see an argument for adding a third (colorizing the output) or even a fourth (determining whether the program is being run interactively or in a script).

My point isn’t that ls is “really” four things instead of one – it’s that there isn’t a single correct way to divide ls into packages that do only one thing each. Any division will inherently leave room for at least a bit of subjective judgment.

Moreover, that’s exactly what we should want: when we say “a library should do only one thing”, that’s a convenient shorthand but doesn’t need to be taken 100% literally. (Even Pike and Kernighan agree that it’s sometimes correct to add options to an existing program.) And when deciding what level of functionality to consider as “one thing”, we should (and inevitably do) consider factors such as code complexity and code length; a library that takes only 30 lines likely strikes many developers as accomplishing “one thing” in a way that the same functionality in a 3,000 line library might not.

This is especially true because one of the main reasons we want to follow the Unix philosophy is to write bug-free code – and as studies have repeatedly shown, longer code gives bugs more places to hide. This means that a library with only a few lines is much more likely to be correct – and thus can be said to better follow the Unix philosophy of doing just one thing.

As a result, the language’s overall support for writing concise, expressive code matters quite a bit. Highly expressive languages are less likely to need deep dependency graphs to keep each package to a Unix-philosophy-compliant size; packages can be “micro” in size (and complexity) without being “micro” in power. Fortunately for those of us writing Raku, it’s one of the most expressive languages, so we’re off to a strong start.

Great standard library

Next (and more obviously) you can avoid an abundance of left-pad-like micro packages by using a language with a great standard library. Standard libraries have an obvious direct impact: when a function is built into the standard library, no one needs to rely on a package that provides that function. As a concrete example: no one would ever write a left-pad package in Raku because the standard library already has sprintf and '%5s'.sprintf($str) already does the job of left-pad($str, 5).

The direct effects of the standard library are fairly limited – each standard library function can only directly replace a single micro package. Fortunately, a great standard library can have a much larger indirect effect. If a standard library has many small, composable functions that can be put together in different ways, then a vast majority of “micro packages” can be trivially replaced with a simple call to two or three standard library functions – which means that those micro packages never get written in the first place. Again, this is an area where Rakoons are in luck, since we benefit from exactly that sort of composable standard library (which was the topic of my 2021 Raku Conf talk).

One note on the subject of standard libraries: It’s very helpful for a language to have a great standard library, but that doesn’t mean it needs a huge one. It certainly doesn’t need a “batteries included” standard library – after all, when a standard library includes too many batteries, they tend to leak battery acid or at least need replacing. The difference between a “great” standard library and a “batteries included” one is that a great standard library includes all the composable functions you need to avoid left-pad-like micro packages, whereas a batteries-included standard library attempts to include non-micro packages (e.g., a web server) in the core standard library.

Utility package(s)

The final way for a language to reduce the size of dependency trees without giving up on the Unix philosophy is to collectively agree on a utility package that replaces numerous micro packages. I’ve listed this after “an expressive language” and “a great standard library” both because it’s a less ideal solution and because it can build on the foundations provided by the language and standard library.

For example, despite the somewhat harsh words I had for JavaScript earlier in this post, JS has some excellent utility packages. The current market leader, lodash, is a direct or indirect dependency of nearly 9 out of 10 JS packages and does a very good job of aggregating many common functions that might otherwise be micro packages. As large as the dependency trees are in JavaScript-land, they’d doubtless be even larger without lodash, underscore, and similar utility packages.

Lodash is, however, hamstrung a bit by JavaScript having a standard library that is (for various historical and standards-related reasons) somewhere between “small” and “non-existent”. This makes the job of a JavaScript utility library harder in two ways: first, it needs to devote a good chunk of its code just to implementing functions that would have been in a deeper standard library to begin with (or that would have been trivially derivable from standard library functions). And second, because a JavaScript utility library cannot itself use functions missing from JavaScript’s standard library, it’s limited in what it can concisely express. Despite these limitations, lodash enrichs the JavaScript ecosystem and helps to contain the explosion of micro packages.

Or does it? I can already hear some of you objecting that lodash is just a collection of independent files. You might reasonably ask whether replacing two dozen micro packages with one package consisting of two dozen files really provides much benefit. While that is a reasonable question, it also has a reasonable answer.

This sort of consolidation provides at least three benefits: First, part of the complexity from micro-package multiplicity arises from packages that approach the same basic problem from inconsistent (or just confusingly different) directions. This could be anything from wanting to be called with arguments in a different order to providing data that’s in the wrong shape for another package. In either case, the cause is the same: because each micro package was developed independently, there’s no reason for either package to fit with the other. In contrast, with a utility package there’s a guarantee that each function within the package has been designed with the goal of fitting with the other utility functions; any misfit represents a bug in the package. And even though other packages don’t have any obligation to fit with the utility package, there’s a much greater chance that they will choose to do so (assuming that, as in lodash’s case, the package genuinely is widely used) than there is that two random micro packages will be designed to work well together.

The second advantage of consolidating micro packages into a utility package is that it avoids one of the big threats of micro packages: zombie dependencies. The problem is that, because a micro package is (by definition) pretty small, it’s possible for the package maintainer to disappear without the users noticing, at least for a while. But that results in a zombie package, shambling along without anyone to fix any bugs that do come up, or to even merge any bug fixes others may submit. In the worst case, this can result in a package being left with known security vulnerabilities or even being turned over to a malicious actor. By consolidating micro packages into a utility package, you avoid this risk. (Of course, you trade it for the risk that all of the maintainers of the utility package could disappear at the same time. But for a major package like lodash, that’s both much less likely to happen and much more likely to be something you’d hear about if it did happen.)

Finally, consolidating micro packages provides a third benefit that’s cultural rather than technical – but perhaps all the more important for that reason. By keeping the total number of dependencies low, utility packages make the act of adding new dependencies more psychologically meaningful. If the codebase currently depends on two modules, then most developers will put at least a bit of thought into whether they should add third dependency. But if the codebase already has a dozen dependencies, making that a baker’s dozen is much more likely to feel like a rounding error. I’m not against people adding dependencies, but I am against them doing so thoughtlessly – so I like the idea of a utility package keeping total dependency count low enough that we’ll all think a bit more about each dependency we add.

Given these advantages, I think consolidating functionality into utility packages is a very good thing, and I think the JavaScript ecosystem is better off for the existence of packages like lodash. And I’m sad to say that the Raku ecosystem doesn’t have anything quite like that.

… or at least it doesn’t today. But keep following the Raku Advent calendar, because in part 2 of this post, I’ll be announcing a new utility package for Raku (!).

Conclusion

Both the Unix philosophy (“do one thing and do it well”) and the idea of killing your dependencies have merit – but they pull in opposite directions. They’re not 100% incompatible, but at the language level, it takes a great deal of thought to grow an ecosystem where libraries tend to follow the Unix philosophy without devolving into left-pad-like micro-package multiplicity. Three good ways to do so (again at the language level) are to prioritize language expressiveness; to have a great and composable standard library; and to embrace a utility package. Of these three, Raku currently does very well on the first two but is missing the third, at least today.

Of course, much of the job of balancing the Unix philosophy with the avoidance of left-pad cannot be handled at the collective level of the language or ecosystem – it needs to be handled at the individual level. In the code trenches, it’s always up to the author of each program to decide whether any particular dependency is worth adding or is better to rewrite (though heuristics like the ones in Surviving Software Dependencies may help). But with a bit of care at the language level, Raku can help make correctly striking this balance just a bit easier and, more importantly, just a bit more of the community norm.

Day 5 – Santa Claus is Rakuing Along

Part 1 – The Elven Journals

Prologue

A Christmas ditty sung to the tune of Santa Claus is Coming to Town:









    He's making a list,









    He's checking it closely,









    He's gonna find out who's Rakuing mostly,









    Santa Claus is Rakuing along.

Santa Claus Operations Update 2021

Santa’s operations were much improved year-over-year since his IT crew adopted Raku as their go-to programming language (see the articles from Raku Advent 2020). In addition, he and his non-techy elves found the language so easy for beginners to use, he decided to see if he could use it for managing the many task reports and other documents needed by individuals as well as managers.

Note that one of the improvements he had instituted was issuing mobile tablets to all Elves. The tablets are provisioned with Raku apps that provide a Raku REPL (Read Eval Print Loop) for easy code snippet use as well as a browser to access websites where more code can be run. As well, the tablets are equipped with a Terminal app that enables remote logins to the powerful servers in the IT Department.

He also insituted a new policy on record-keeping that requires Elves to keep a digital journal and make at least two entries per work shift. They should log in to their own account on the server via a terminal app [‘termius’ is this author’s choice] in order to make an entry or check it. When something is to be noted use vi to open their text journal (file ‘$HOME/journal’, kept safe under central version control with git) and make an entry. The first format attempted was this:









    =Time 2021/4/22 1340



















    =Entry Designed a new toy I'll call 'Herby' the harp seal, a









    stuffed animal with a waterproof covering suitable









    for a bath toy for toddlers.

Note the entry is in a simple format (using pod abbreviated blocks, see https://docs.rakulang.site/language/pod#Abbreviated_blocks) to allow line parsing and further manipulation. One or more blank lines separate the new entry from any previous entry. Open the new entry with the year, month, and day number separated by slashes, one or more spaces, then the hour and minute in local time expressed in twenty-four hour format, followed by one or more blank lines. Then come the journal notes with paragraphs separated by one or more blank lines. (Of course since Raku will be processing the journals Unicode characters above the ASCII range are perfectly acceptable.)

Elves are expected to make a journal entry at the start and end of their work shift. (There is also a trial ongoing to use a voice-to-text system to ease the journaling effort, but its reliability is not very good at the moment. Also, the verbosity and silliness of the Elven people’s language makes filtering the sound quite a challenge for the IT system.)

Periodically all the journals are read and and converted to a format which makes it easier to find specific entries by date, time, and Elf. The process also creates reports detailing task progress and Elf efficiency. (Details will be in Part 2 of this article.)

The initial processing of all the Elven journals looked something like this:









    use SantaClaus::Utils;









    my %j; # or %journal









    for each elf









        %j<e> = [];









        read journal file









            for each line









                if empty









                    end current object









                elsif a datetime entry









                    create a DateTime object









            ...

That was not going to work–too much room for erroneous parsing and cowboy coding!

The second approach was to tightly control the Pod components allowed, extract the Pod programmatically with Raku, and fail early on problems with helpful comments. But one of Santa’s concerns was the strange nature of Pod blocks. Most have as a base class Pod::Block so those all have the following common attributes:









    my class Pod::Block {









        has %.config;









        has @.contents;









    }

Due to the @.contents array, Pod blocks can be nested infinitely deep, so handling unknown Pod can be tedious and error-prone. In addition, extracting Pod from another document requires more than a beginner’s knowledge of Raku. Thus it was decided to create a Raku journal management program which uses the Raku module Pod::Load to extract the Pod. Additionally, the current policy for journal entries precisely describes the journal entry format which can easily be extracted by Raku while reading the journal, so a helper program was added to initiate a a journal entry with a templated Pod chunk with embedded instructions to ease the Elve’s input.

Here is an example of that template block newly created at the end of the current journal file for an Elf computer user name of ‘jerzi’ and default task ID of ‘build-toy’:









    Z<Edit the following Entry as necessary. Add or delete Z comments as desired.>









    =begin Entry :time<2021-11-30T07:12>









      Z<Enter one of ':start' or ':end' in the following config line if applicable.>









      =begin Task :id<build-toy> :employee<jerzi> :status









        Z<Enter notes and comments here; use blank lines to separate paragraphs>









      =end Task



















      Z<Add another Task if applicable; ensure the ':id' is correct before doing so.>









    =end Entry

The user merely adds the appropriate data and (optionally) removes the Z<> Pod comments which are ignored by Santa’s journal reader (although they leave blank spaces after Pod::Load parsing, all text paragraphs are then normalized by the reader and blank paragraphs will disappear).

Let’s see what happens when the Elf starts a brand new journal which will look like the above and then edits it to look like this:









    =begin Entry :time<2021-11-30T07:12>









      =begin Task :id<build-toy> :employee<jerzi> :status



















      =end Task



















    =end Entry

Now run the check:









    $ check-journal









    ERROR: Task id<build-toy> has a ':status' config key but no explanation

Journal entries require an explanation when it is a ‘:status’ entry without a ‘:start’ or ‘:end’ config key.

The typical Elf’s work shift now looks something like this:

Check messages for any special instructions









    $ add-event









    See new Entry appended to file: journal









    $ vi journal









    ...









    :wq









    $ check-journal









    # make necessary edits until the journal









    # checks okay

Work the task(s)
Login and update the journal as necessary
Login and make the end-of-shift journal entry

Summary

The new system now creates a complete record of an Elf’s work and serves as an electronic time clock to replace the old punch cards and electro-mechanical clocks.

In addition, the individual journals provide the data for detailed operational and managerial reports for all supervisory levels.

Part 2 of this article will discuss those aspects of the new system and how Raku makes them easier to program for less experienced programmers.

Note: See all code used in this article in the repo at https://github.com/tbrowder/SantaClaus-Utils.

Santa’s Epilogue

Don’t forget the “reason for the season:” ✝

As I always end these jottings, in the words of Charles Dickens’ Tiny Tim, “may God bless Us , Every one!” [1]

Footnotes

A Christmas Carol, a short story by Charles Dickens (1812-1870), a well-known and popular Victorian author whose many works include The Pickwick Papers, Oliver Twist, David Copperfield, Bleak House, Great Expectations, and A Tale of Two Cities.

Day 4 – Santa’s OCD Sorted

Santa has been around for a long time already. Santa remembers the days when bits where set by using a magnetic screwdriver! In those days, you’d made sure that things were orderly set up and sorted for quick access.

Santa likes the Raku Programming Language a lot, because it just works like Santa thinks. There’s just this one thing missing to make Santa feel at home again, just like in the olden days: an easy way to make sorted lists and easily insert new values into these lists to keep them up-to-date.

Sure, Santa knows there are hashes. And if you want to iterate over all keys alphabetically sorted, you can easily do:

  for %hash.keys.sort -> $key {
      ...
  }

or if you want both the key and the value:

  for %hash.sort(*.key) -> (:$key, :$value) {
      ...
  }

But that just feels like a lot of extra work on big hashes that were filled organically from keys and associated values.

So Santa went looking in the Raku ecosystem and was really glad when the Array::Sorted::Util distribution propped up on the search term “sort”.

So what does that do? Well, it exports a few subroutines, the most simple one is inserts. You give it an array and an object, and it will insert it in the array at the correct location to keep the array sorted:

  use Array::Sorted::Util;

  my str @names;
  inserts(@names,$_) for <Zaphod Arthur Ford>;
  say @names; # [Arthur Ford Zaphod]

But what if you would have a list of Pairs with names and gifts? You’d need two arrays, and they would have to be kept in sync! Well, that is also easily be possible with inserts Santa found out:

  use Array::Sorted::Util;

  my str @names;
  my str @gifts;
  for Zaphod => 'arm', Arthur => 'tea', Ford => 'blanket' {
      inserts(@names, .key, @gifts, .value);
  }
  say @names;  # [Arthur Ford Zaphod]
  say @gifts;  # [tea blanket arm]

And then, if you want to look up the gift of a specific person, you’d use finds:

  say @gifts[$_] with finds(@names, 'Arthur');  # tea
  say @gifts[$_] with finds(@names, 'Marvin');  # (no output)

So Santa made changes to the code to use two lists instead of a hash. But the elves really didn’t like that. So they went searching in the Raku ecosystem as well, and found Array::Sorted::Map. This allowed them to easily apply a Map interface to the two lists:

  use Array::Sorted::Util;
  my str @names;
  my str @gifts;
  for Zaphod => 'arm', Arthur => 'tea', Ford => 'blanket' {
      inserts(@names, .key, @gifts, .value);
  }

  use Array::Sorted::Map;
  my %gifts := Array::Sorted::Map.new(
    keys => @names, values => @gifts
  );
  .say with %gifts<Arthur>;  # tea
  .say with %gifts<Marvin>;  # (no output)

That was good as a temporary measure. But it still wouldn’t allow the elves to make changes to the hash without having to resort to things like finds, inserts or deletes operating on the underlying arrays.

The wise Santa realized that the elves have the future, so it was important to find a way that elves as well as Santa would be happy with. A further search revealed the existence of a Hash::Sorted module, which promised to keep the keys of a hash created with that class, to always be in sorted order.

When Santa proposed to have the elves use that module, they were very glad. Now the elves could use the familiar hash idioms and satisfy Santa’s need for order:

  use Hash::Sorted;

  my %gifts is Hash::Sorted;
  my @names := %hash.keys;
  my @gifts := %hash.values;

  %gifts = Zaphod => 'arm',
           Arthur => 'tea',
           Ford =>   'blanket';
  say @names;  # [Arthur Ford Zaphod]
  say @gifts;  # [tea blanket arm]

  .say with %gifts<Arthur>;  # tea
  .say with %gifts<Marvin>;  # (no output)

What the elves didn’t realize, was that the Hash::Sorted module is just a frontend for the subroutines provided by Array::Sorted::Util, and the Hash::Agnostic role. But Santa wisely didn’t tell that to the elves.

And all was well on the North Pole!

Day 3 – Silently

Santa was working on some programs to handle all of the intricacies of modern-day just-in-time package delivering, and got annoyed by some parts of the program getting noisy because some elf had left some debug statements in there. Ah, the joys of collaboration!

So Santa wondered whether there could be a way to be less distracted by what otherwise seemed to be a perfectly running program. Looking at the Wonderful Winter Raku Land, after a little bit of searching, Santa found the silently module. That was great! It’s a module that exports a single subroutine silently that takes a block to execute, and will capture all output made by the code running in that block.

Whereas Santa would first do:

    assign-optimal-trajectory(@gifts);

and get a lot of unwanted output, now Santa could just do:

    silently { assign-optimal-trajectory(@gifts) }

and get the same result without so much noise.

But alas, just before all gifts where on their way, it turned out that some gifts had somehow been lost, or at least not assigned a proper trajectory. Now, Santa had the option of running the same program again, but with all of the noise. And time was getting short! But then Santa realized that if something had gone wrong, there would be an error message on STDERR.

And guess what, the silently module actually only muffles whatever noise was generated! After running your code, you can still find out the noise it made on STDERR, because silently returns an object that you can call the .err method on to get all the text that was sent to STDERR. So the code became:

    my $muffled = silently {
        assign-optimal-trajectory(@gifts)
    }
    if $muffled.err -> $errors {
        say $errors;
    }

This allowed Santa to quickly find and fix the problem for the gifts that had gone wrong. And Christmas was saved once again!

Later, some elves were reprimanded for leaving debug statements in production code. They promised to not do it again.

Day 2 – Rotation of Log files in a nutshell

Santa has a cloud-based application that helps him to deliver the gifts to the children. Once the gifts have been delivered Santa registers the delivery operation through the deliveries.log file. Just after the inspector elves review this log file comparing it with the list of children to ensure that all the children have received correctly their gifts.

The number of deliveries is very large and the price of the cloud-based storage too. To accomplish the cloud budget it’s neccesary to set a maximum log capacity such as:

The log information will be distributed in 5 log files
Each log file size should be about 20 MB

How will we do?

Santa needs a process that will run with regular basis. This process will rotate the log files if the size of the main log file is almost 20 megabytes. The maximum number of log files will be 5, that is:

deliveries.log
deliveries.log1
deliveries.log2
deliveries.log3
deliveries.log4

When deliveries.log file size be almost 20 megabytes its name will changes to deliveries.log1.

Next time this happens, the name of the deliveries.log1 will changes to deliveries.log2 and the name of deliveries.log will changes to deliveries.log1.

And so on until reach to deliveries.log4. At this point, the deliveries.log4 file (the oldest) will be deleted because there are 5 log files, and the name of the deliveries.log3 file will changes to deliveries.log4.

Starting the Raku script to get it

First we need know the path of the log files, the name of the main log file and its absolute path:

my $path_logs     = '/var/log/gifts/';
my $main_log      = 'deliveries.log';
my $path_main_log = "$path_logs$main_log";

As we see in "$path_logs$main_log", using double quotes to concatenate strings is very visual, but a more elegant approach is achieved using ~ operator between the strings variables:

my $path_main_log = $path_logs ~ $main_log;

Let’s keep going setting the maximum size of the main log file (20 megabytes) in bytes:

my $max_size_log = 20000000;

Also, we need set the maximum number of log files that we will rotate:

my $max_logs = 5;

We have already the ingredients, now we are going to set the requirements.

Requirements

Now we need to know if the main log file deliveries.log exists and to check its size.

The Raku capacity to handle files with .IO is very robust, concise and complete. Also, we can use the method .e next to a absolute file path to check if it exists, and exit the script if the file doesn’t exist. All in one line:

exit unless $path_main_log.IO.e;

Similarly using the .s method we can get the size of a given file. We can exit if the main log size is lower than the specified in $max_size_log:

exit if $path_main_log.IO.s < $max_size_log;

As we see, this Raku code is very readable and concise, and makes it easier to read in case of debugging.

At this point, the deliveries.log file (the main log file) exists with a size of 20 megabytes or more. The next step is to find out how many log files exist in the logs folder.

How many log files we have?

We can populate an array with the absolute paths of the log files using chaining methods. Chaining methods are part of the Raku functional programming paradigm, a powerful feature that pass the result of a method to another as an argument, in similar way like pipes in bash shell:

my @log_files = $path_logs.IO.dir.grep(/$main_log$ | $main_log\d+$/).sort;

Let’s dissect the sneak:

@log_files is an array that will populates with the absolute paths of the log files.
First we use $path_logs whose value is /var/log/gifts/, that is the absolute path of the logs folder. The next methods will search the log files in this location.
The .IO.dir method returns all the absolute paths of the files and folders located in the $path_logs folder. As curiosity, the dir command was already used by the RT-11 operating system 51 years ago. Later, this command was also adopted by CP/M and MS-DOS. There are things that will never change.
The next method is .grep(/$main_log$ | $main_log\d+$/). The .grep method matches the regular expression pattern provided as parameter /$main_log$ | $main_log\d+$/ over each returned value of the previous method. This pattern admits two possibilities: matches the main log file $main_log$ that is deliveries.log or (|) matches the main log file $main_log next a number $main_log\d+$ . This regex matches the absolute paths of deliveries.log or deliveries.log1 or deliveries.log4 log files but never matches strings like deliveries.log.old.
The last method is .sort. This method sorts the elements returned by the previous method. This ordering is essential to continue with the next operations.

Many operations in one line, with chaining methods that’s be fine.

There are many log files, let’s fire the last

If the number of the log files has reached to the limit established in $max_logs we need to remove the last log file deliveries.log4 in our case, both in the array @log_files and in the filesystem. Your attention please:

@log_files.pop.unlink if @log_files.elems == $max_logs;

In this case we are also use chaining methods:

@log_files is the array with the absolute path of each log file.
The .pop method removes and returns the last element from @log_files array. This element is the absolute path of deliveries.log4 file.
The .unlink method removes a file in the filesystem. In this case removes the returned element by the previous method .pop, that is the log file deliveries.log4.

Then comes the condition with if although it is evaluated before. This condition compares the number of elements of @log_files using the .elems method with $max_logs whose value is 5. If the condition returns true the deliveries.log4 element is fired, both from the @log_files array and from the filesystem.

The use of the methods .pop and .unlink seems amazing to me.

Moving log files to the right position

Now, we need to rename the log files as we seen before.

Here, the Raku magic help us with the for iterator:

for @log_files.kv.reverse -> $file, $idx { $file.rename($path_main_log ~ $idx + 1); }

The -> $file, idx is the signature of the block {} and is made up of the $file and $idx parameters that are populated by the .kv method. Each iteration provides the absolute path of the current log file in the $file variable and the current index iteration in the $idx variable. All this is done in reverse order using the .reverse method.

The content of the block {} simply use the .rename method with the current log file $file to change its name to the name of the next log file. The name of the next log file is the main log file $path_main_log plus the next index number ~ $idx + 1.

Beautiful one line code running many operations. A scripter dream.

Putting all together

my $path_logs     = '/var/log/gifts/';
my $main_log      = 'deliveries.log';
my $path_main_log = $path_logs ~ $main_log;
my $max_size_log  = 20000000;
my $max_logs      = 5;

exit unless $path_main_log.IO.e;
exit if $path_main_log.IO.s < $max_size_log;

my @log_files = $path_logs.IO.dir.grep(/$main_log$ | $main_log\d+$/).sort;

@log_files.pop.unlink if @log_files.elems == $max_logs;

for @log_files.kv.reverse -> $file, $idx { $file.rename($path_main_log ~ $idx + 1); }

Epilogue

The multiparadigm approach of Raku provides an astonishing capacity to perform complex operations concisely and pushes scripting to the next level, and Santa knows it.

	He's making a list,
	He's checking it closely,
	He's gonna find out who's Rakuing mostly,
	Santa Claus is Rakuing along.

	=Time 2021/4/22 1340

	=Entry Designed a new toy I'll call 'Herby' the harp seal, a
	stuffed animal with a waterproof covering suitable
	for a bath toy for toddlers.

	use SantaClaus::Utils;
	my %j; # or %journal
	for each elf
	%j<e> = [];
	read journal file
	for each line
	if empty
	end current object
	elsif a datetime entry
	create a DateTime object
	...

	Z<Edit the following Entry as necessary. Add or delete Z comments as desired.>
	=begin Entry :time<2021-11-30T07:12>
	Z<Enter one of ':start' or ':end' in the following config line if applicable.>
	=begin Task :id<build-toy> :employee<jerzi> :status
	Z<Enter notes and comments here; use blank lines to separate paragraphs>
	=end Task

	Z<Add another Task if applicable; ensure the ':id' is correct before doing so.>
	=end Entry

	$ check-journal
	ERROR: Task id<build-toy> has a ':status' config key but no explanation

	$ add-event
	See new Entry appended to file: journal
	$ vi journal
	...
	:wq
	$ check-journal
	# make necessary edits until the journal
	# checks okay

_

The name

The goal

Why those rules?

The scope

Code that should be in Raku’s standard library (one day)

Code that ought to stay out of Raku’s standard library

Code that is already in Raku(do)’s standard library but that we shouldn’t use

Current status

sub-packages

sub-package selection

Future plans/questions

Versioning

Packages that outgrow _

Micro packages already in the ecosystem

Making _ trustworthy

Conclusion: _ and the Unix philosophy

Obsolete

Override

Suppress

Serialize

Conclusion

Running locally

Running on GitLab

Room to grow

The downside of micro packages

A wild dilemma appears

A less naive compromise

Maximize expressiveness

Great standard library

Utility package(s)

Conclusion

Part 1 – The Elven Journals

Prologue

Santa Claus Operations Update 2021

Summary

Santa’s Epilogue

Footnotes

How will we do?

Starting the Raku script to get it

Requirements

How many log files we have?

There are many log files, let’s fire the last

Moving log files to the right position

Putting all together

Epilogue

`_`

Packages that outgrow `_`

Making `_` trustworthy

Conclusion: `_` and the Unix philosophy