In my previous post, I made the case that programming languages use a utility library to provide small-but-commonly-needed functions. Today I’m introducing a new module that I hope will play this role for Raku.
In this post, I’ll introduce you to this new package as it exists today. Next, I’ll turn to plans for the future and how I’d like to see a Raku utility package grow over time. Then we’ll wrap up by taking a step back and discussing how all of this fits with the Unix philosophy.
First of all, the name: the utility package I’ve released is named
_ (pronounced/also known as
Instead, the name
_ just falls naturally out of Raku’s topic variables: in Raku, if you want to refer to the current topic without giving it a specific name, you use the appropriate sigil followed by
%_. So it’s only natural that a utilities package – which, by its nature, can’t have a particularly descriptive name – would use
_ (in this case without a sigil, since modules/packages don’t use sigils).
Besides, if it wasn’t named
_, the other obvious name would be
util – but that name is more-or-less occupied by a fellow Rakoon. And, finally, this name gives us the helpfully short use statement of
use _ – a nice feature for quick prototyping if
_ ends up being widely used. (For production use, you might want to qualify that use statement, as I’ll discuss later in this post. But starting with a short name is also helpful if our fully qualified use statement risks getting a bit long).
_’s purpose is to be a meta utility package that lets Raku programs avoid rewriting the same helper functions without embracing the excessive use of micro packages in the Raku ecosystem. When I say that
_ is a meta utility package, I mean something analogous to the idea of a Linux distro metapackage. Specifically, I mean that unlike many utility packages,
_ is comprised of individual sub-packages. Each sub-package has its own documentation/tests and is an independent unit. My intent is that you can read the README for a
_ sub-package and then use (and fully understand!) that sub-package without needing to know anything about any other
Additionally, every sub-package in
_ makes three promises:
- To have zero dependencies (with a grudging exception for
_files or Core modules)
- To have all its code in a single file (not counting tests/docs)
- To keep that file to 70 or fewer lines
If you have a package or script that meets those requirements and that you’d like to include, please feel free to open a PR. (Or even if it slightly exceeds the requirements; I’m willing to talk about how flexible
_ will be.)
Why those rules?
These rules might strike some of you as a bit odd. In particular, why is
_ so focused on keeping the total code size down? I talked a bit about the value of reducing lines of code a bit in the previous post, but I know that not everyone was convinced. And it is a reasonable question – if taken too far, writing concise code can reduce readability, which is rather the opposite of our goal.
Here’s the answer:
_ packages are short so you can fully understand them. And, by understanding them, trust them.
My goal for
_ it that anyone fluent in Raku can open file for a
_ sub-package, read the code on their screen, and see 100% of the functionality that package implements. (That’s where the “70 lines” limit comes from – it’s my best guess for the number of lines that can fit on a typical screen.) Getting this global view will give you a very different level of confidence than we typically get from software – or at least that’s my hope.
I believe that
_ can provide this much-higher-than-normal level of confidence because the three rules above cut sub-packages off from something our profession is absolutely enamored with: black box abstraction. The idea of black box abstraction is that you can implement some complex functionality, box it up, and expose it to the outside world so carefully that the world can totally ignore the implementation details and can care only about the inputs and outputs.
Now, don’t misunderstand me: I fully agree that it’s a phenomenally powerful tool. Without black box abstraction, there’s simply no way that the vast majority of software in use today – including Raku – would be remotely possible. Indeed, Raku makes great use of abstraction and I’m looking forward to the whole new set of abstractions that Jonathan Worthington’s work on the Raku AST seems poised to deliver soon-ish.
As programmers like to say, there’s no problem that can’t be solved by adding another layer of abstraction – and, as a profession, we sure have solved a lot of problems. But we’ve also created a lot of problems. And I think one reason we’ve created so many is that we often reach for black box abstraction too quickly, without putting enough consideration into the not-inconsiderable costs of additional abstraction.
In particular, whenever code depends on a black box, that means that the author of that code chose to rely on code that, by design, they didn’t need to understand. And they’re claiming that you also don’t need to understand that black-box code. But that means that you can never fully understand the code you’re currently reading either; the very best you can do is reach a partial understanding subject to the disclaimer “assuming both that I correctly understood the black box’s promises and that the black box keeps all its promises”.
The slight flaw is that black boxes never keep all their promises. Well, OK, that might be too strong; software sometimes works. But, at the least, you can never guarantee that any black box will keep any particular promise. As a result, anyone who relies on a black box will, sooner or later, need to open that box up and debug the tangled wires inside. And, as anyone who has ever followed a deep callstack can attest, that often means discovering all the various black boxes nested inside the first box and getting to play with your very own set of software matryoshka dolls.
In fact, you could say that the whole reason for
_ is that we can’t fully trust black box abstraction. If we had a way to guarantee that our black boxes would Just Work™, then having thousands of dependencies would hardly be a problem at all. But, since that’s one thing we cannot guarantee,
_ is deeply committed to not adding additional abstraction. And thus,
_ packages follow the three rules above, and present their entire codebase – dependencies and all – for you to view at once, on a single screen.
When viewing a
_ sub-package, you can look at a single file and, without needing any outside context or info, see whether the code in that file is correct. After all, outside the fairly limited domain of formal methods, pretty much no software is provably correct. But, if we can make the code short and readable enough, maybe we can at least reach what mathematicians (jokingly) call “proof by inspection“: something so simple, that we can tell that it’s correct just by looking at it.
Or at least that’s the theory. But, as Knuth famously reminds us, don’t trust code if you’ve only proved it true, not run it. I’m sure this advice covers proof-by-inspection at least as much as proof by any other method and, accordingly,
_’s brevity hasn’t stopped me from adding significant numbers of unit tests.
Even with the tests, and even as simple as each sub-package is, I’m sure that
_ still contains plenty of bugs – probably far more than I’d like it to. But hopefully, the lack of abstraction in each of
_’s sub-packages also means that we’ll all be able to more easily debug any issues that we encounter: doing so won’t require anyone to understand any code or systems outside of a single, short file. In other words, to borrow a phrase from Aaron Hsu,
_ embraces “transparency over abstraction”.
The rules we just talked about limit on what sub-packages
_ can include – but these rules don’t indicate what
_ should include. Let’s address that now.
_’s scope is easy enough to state in broad terms:
_ should include a package if that package follows our rules and provides functionality that many Raku packages would get utility from. (That’s what “utility package” means!) If the package’s scope is small enough that it can be implemented in 70 lines, then having it as an independent package would create a micro package; if that micro package would be useful in a bunch of Raku programs, then it would likely become a widely used micro package. Since
_’s goal is to limit the number of widely depended-on micro-packages in the Raku ecosystem, any package that meets these two criteria is a good candidate for
But all that basically boils down to “
_ should include packages that are (1) small and (2) useful”. While I doubt that many of you will disagree, knowing what “useful” means in practice is the hard part.
And I’m not entirely sure what exact view of usefulness will be turn out to be the best fit for
_. I do know that many utility libraries implement basic helper functions –
sortBy, etc – that wouldn’t have any utility in
_ because they’re either already built in to Raku or a trivial combination of Raku builtins. So
_ can and should include higher-level utilities; I guess that we’ll have to discover together what exactly that looks like. If you have ideas for
_ packages, please let me know – or, even better, submit a PR!
That said, I do have three general categories of packages that be good fit for
Code that should be in Raku’s standard library (one day)
In addition to reducing the pressure for micro packages,
_ can also help Rakoons to test out packages that might one day belong in Raku but that need a bit more user feedback/time to bake before Raku commits to adding them (and the fairly strong backwards compatibility guarantee entailed by inclusion in Raku itself). Raku’s use experimental pragma already fills a part of this role , but
_ could provide a good home for packages that are a bit too experimental even for that pragma.
Code that ought to stay out of Raku’s standard library
There are some small packages that we can reasonably expect to be widely used but that, for one reason or another, aren’t a good fit for Raku’s standard library;
_ can provide a home for those. Just as the packages in the first category share a lot with packages behind the
use experimental pragma, this category shares a lot with Raku’s Core modules. And, again,
_’s role could be a testing ground of sorts for modules that might one day graduate to being added as a Core module. (Though of course most packages won’t and shouldn’t “graduate” in this sense: I don’t want to suggest that being a
_ sub-package is or should be a temporary status. The vast majority of
_ sub packages will stay
_ sub-packages, which is exactly as it should be.
Code that is already in Raku(do)’s standard library but that we shouldn’t use
Raku and Rakudo both quite correctly make fairly strong guarantees about not breaking spec’d code. But, in return, it asks us to not rely on code outside that guarantee – that is, not to rely on implementation details. Unfortunately Rakoons, just like everybody else, are fairly rubbish at keeping up our end of that bargain – it’s all too easy have thoughts like:
well, this function is already installed and does just what I need. And it’s in Rakudo, so I know it’s decently well-written. So what if it’s marked with
is implementation-detail, I’m sure it’ll be fine.
I’m not judging those thoughts too harshly – I’ve had them myself – but the fact is that it’s not fine. When we, as a community, ignore signposts like
is implementation-detail, the inevitable negative result is that we force Rakudo developers to chose between not changing the implementation detail or breaking user code. Even if they’re “allowed” to break that code under the terms of the agreement (the one that we users are ignoring by relying on the code!), none of the Rakudo devs enjoy breaking things.
If changing an
implementation-detail makes blin runs start failing, then devs will think twice about that change – even if it’s a good change. What’s worse is that the (totally understandable!) desire to discourage users from depending on implementation details risks tempting Rakudo devs to avoid fully documenting those details – which creates/exacerbates the problem of tacit knowledge (sometimes called “tribal knowledge“) – the knowledge possessed by many people in the community, but which isn’t written down or otherwise accessible to new people. Tacit knowledge, in turn, creates barriers to new developers looking to understand how to improve the Raku’s main language implementation, which hurts everyone. Accordingly, one additional goal for
_ is to head this problem off by providing alternatives to any of Rakudo’s
implementation-details that developers might be tempted to depend on.
So, let’s see: a package is a good fit for
_ if it
- should be in the standard library but isn’t
- shouldn’t be in the standard library
- or is in the standard library but shouldn’t be used.
I think that covers all possible packages except for those that are in the standard library and should be used, so I’m not sure we really managed to narrow it down! But maybe taking a look at
_’s initial packages will provide some examples of packages that, at least in my view, were worth including.
As of today (December 11, 2021),
_ includes 7 sub-packages and is beta software.
_’s source code and documentation are on GitHub and
_ itself can be installed via Zef with the command:
zef install '_:ver<0.0.1>:auth<zef:codesections>'
The beta period will be fairly brief, but will last long enough to get initial feedback on the existing functions/APIs.
During this beta period,
_ explicitly makes no guarantees about backwards compatibility. In particular, ensuring that
_ is strongly backwards compatible once promises to be may require breaking changes to every
_ when 1.0.0 version is released. Because backwards compatibility is very important for a package like
_, my goal is to reach 1.0.0 as soon as possible, with an exact date depending in part on what approach
_ takes to compatibility (more on that below – and, as you’ll see, it’s pretty likely that
_’s 1.0.0 version won’t actually be called “1.0.0”).
_ includes the following sub-packages. You can find more info and usage examples for each sub-package in its
README file, linked from its name.
Pattern::Match– provides a
choosefunction that enables pattern matching using Raku’s signature destructuring as an alternative
chooselets you bind variables to elements of the match, supports placeholders and literals, and can detect unreachable/shaddowed patterns. (Fun fact: musing about a function like
choosebut not wanting to create a micro package is what first started me on the trail towards
_). [source code]
Print::Dbg– provides a
dbgfunction designed to support more ergonomic print-debugging (compared to Rakudo’s dd).
dbgaccepts any number of arguments and return the same values (i.e., effectively a no-op). As a side effect,
dbgprints (to stderr) the file and line on which it was invoked and a
.rakurepresentation of each argument; if any of those arguments are variables,
dbgprints the variable name. Because
dbgreturns the values it was passed, you can use it to add debugging code without altering the behavior of the code being debugged. An example:
my $new-var = $old1 + dbg($old2) + $old2.
dbgwas inspired by Rust’s
dbg!macro. Compare with guifa’s
Debug::Transput, which provides similar functionality. [source code]
&_as an alias for
&?ROUTINEand thus provides a “topic function” that allows for convenient self-recursion. Compare with APL’s ∇ function. [source code]
Text::Paragraphs– provides a
paragraphsfunction analogous to Raku’s
lines: that is, it splits a
Stror the contents of a file into paragraphs. It can detect paragraphs that are separated by blank lines and/or paragraphs that are marked by first-line indentation. It is also able to distinguish between the start of a new paragraph and the a bulleted or numbered list (which is not a new paragraph). [source code]
wrap-words, a replacement for the Rakudo
wrap-wordsis slightly less naive because it provides basic support for wide Unicode (supporting character width without knowing the font is impossible in theory but works OK in practice). Additionallly,
wrap-wordsrespects the existing whitespace in between words so, unlike Rakudo’s version, it doesn’t need to have an opinion about how many spaces to put after a period (though, for the record, Rakudo’s view that periods should be followed by two spaces is the correct one).
wrap-wordsuses the same greedy wrapping algorithm as Rakudo (if anyone is up for a challenge, I’d welcome a PR that implements the Knuth & Plass line-breaking algorithm … in under 70 lines of code – here’s a JS implementation in only ~300 lines to get you started)! [source code]
Test::Doctest::Markdown– provides a
doctestfuntion that tests Raku code contained in a Markdown file with the goal of testing example code in a
READMEor other documentation. (Nothing’s worse than broken examples!)
doctesttests each code block as follows: If the code block has
doctesttests the code’s output against the expected output; if the code block doesn’t have
doctesttests whether the code can be
doctestalso supports adding configuration info by preceding the code block with a
<!-- doctest -->comment; currently, the only config option is to provide setup code that’s run as part of the test without being displayed in the Markdown file. Inspired by Rust’s documentation tests. [source code]
Test::Fluent– provides a thin wrapper over Raku’s Core Test module that supports testing in a more fluent style as shown in the example below. Most notably, this style supports providing test descriptions in pod6 declarator comments. Inspired by the Fluent Assertions (.NET’s) and Chai (JS) packages. [source code]
# with Raku's Test: unlike escape-str($str), /<invalid-chars>/, "Escaped strings don't contain invalid characters"; # with Test::Fluent: #| Escaped strings don't contain invalid characters escape-str($str).is.not.like: /<invalid-chars>/;
As I mentioned earlier, you can import all of
_’s sub-packages with
use _. This imports all the non-test functions; to import the test functions, pass ‘Test’ as a named parameter:
use _ :Test or import both test and non-test functions with
use _ :ALL. If you would like more control over the imports, you can pass a list of the specific functions you’d like. For example, to import only the two text-processing functions, you would write
use _ <¶graphs &wrap-words>.
I have one medium-term goal for
_ that I’d like to take care of before a stable release. I also have several questions I’m pondering (thoughts/ideas appreciated!). And, of course, I’d like to keep building out the functionality and robustness (more tests!) of the existing sub-packages.
The goal – and the largest blocker for a 1.0.0 release for
_ – is to figure out the best way for
_ to version sub-packages and to implement that a versioning system.
I’m still in the design phase for this part of
_, but I’m optimistic.
Raku offers nearly unique opportunity to get versioning right. With the exception of the in-alpha-testing language Unison, I’m not aware of any language that supports versioning as a first-class concept to the degree that Raku does; Raku goes so far as letting us set both
api info for nearly every language construct. Even better, Raku’s strong support for multiple dispatch lets us “grow” Raku functions without breaking them: when we define a new
multi candidate with a narrower signature, we add something new without breaking any existing calls. (I’m using “grow” in the sense Rich Hickey introduced – you grow a function by either requiring less from or providing more to that function’s callers).
Given all these advantages, I’m hoping that
_ 1.0.0 will manage versions in a way that gives users fine-grained control over which version of a
_ function they use – but where exercising that control is largely optional for most users because nothing ever breaks.
But will that 1.0.0 release actually be a “1.0.0” release? I have always used semantic versioning and think it’s a useful tool for communicating changes to users. That said, it’s also true that semver has real problems. In particular, it seems like
_ nature – a collection of independent sub-packages, in which changes to one sub-package have no effect on any other – might not be a great fit for the binary nature of semver.
Consequently, I’m strongly considering calendar versioning (calver) or some other non-semver versioning scheme.
At the very least, having periodic scheduled
_ releases would provide a natural way to bundle sub-package fixes. It might even make sense to track Raku’s version and backwards compatibility stance (which would mean not allowing any breaking (non-growing) changes except for when a new Raku language version is released).
I also want to put some thought into letting users select a version at the sub-package (or even function) level. One of the advantages of a meta package like
_ is that it bundles administrative issues like upgrading, so I don’t want to set anything up that would add work for users. At the same time, giving users more control would be a great feature. There’s also the question of what version to provide when users don’t specify an exact version: I don’t want to reinvent the wheel or to be needlessly inconsistent with zef, but the arguments for/against golang-style minimal version selection have me intrigued (especially the ones Russ Cox raised in this 2018 talk).
_ needs to have a decent responsible disclosure process before it’s ready for a stable release (maybe that’s not technically a “versioning” issue, but it’s close enough; a security bug would certainly lead to a new version!). The inherent simplicity of
_ sub-packages should make security flaws much less likely – but that phrase has “famous last words” written all over it, so
_ will definitely err on the side of caution. I don’t think there’s a whole lot to decide here; it’s just a matter of setting it up.
So, lots to think about, several decisions to make, and some implementation code to write.
Packages that outgrow
Another question I’m mulling over is how
_ should act when a package is removed from
_. This seems like something that could happen because the package adds enough features that it can’t fit in 70 lines without sacrificing clarity – in which case it makes sense for the sub-package to spin off into a full package of its own. Or a sub-package might leave
_ because it “graduates” into Raku’s standard library/Core modules. (Or a package could be removed because it was a bad idea in the first place, but that’s hopefully rare and can be handled as a normal deprecation/breaking change).
If a sub-package is removed from
_ and a user tries to use one a function from that sub-package then, unless we handle that as a special case, the user would get an error. So the question is if we want to add any special logic for removed packages. If that sub-package still exists but just lives elsewhere, then
_ could import it as a dependency and re-export it as a sub-package. This would prevent needlessly breaking user code but would mean that someone could believe that they were use a
_ sub-package but actually be using an external package – which risks drastically weakening
_’s guarantees (and we’d no longer have 0 dependencies).
I suspect that the best answer here is to re-export the old packages but to throw an
is DEPRECATED warning. But I’d put bit more thought into whether there’s an alternative that would avoid the dependency.
Micro packages already in the ecosystem
Next, I’d like to put some thought into how (if at all)
_ should approach existing micro packages in the Raku ecosystem. For the initial packages in
_, I focused entirely on preventing new micro packages from becoming widely used dependencies. In particular, I avoided knowingly duplicating any existing Raku packages (well, with the slight exception of guifa’s Debug::Transput, but that, as that package notes in its README, Debug::Transput was based off an idea I mentioned to guifa on IRC).
But it might make sense for
_ to one day include the code from Raku packages (or slightly modified versions of them). To keep its guarantees,
_ would need to create a sub-package based on the package’s code, i.e. fork the package. I would want to be very careful about this – even though forking and re-distributing a free software package is entirely allowed by the license, it can sometimes come off as a bit rude. And I don’t want anyone to think of
_ as a package that’s interested in taking credit for other people’s work.
Despite those reservations, there’s one really compelling reason to consider forking packages:
_’s purpose is to reduce the number of widely-used micro-package dependencies in the Raku ecosystem, and there’s no better way to do that than to find packages that are already widely used micro packages. (Or, said differently, to find packages that are furthest upstream in the Raku River.) And, fortunately, this sort of hard data for the Raku ecosystem is easy to get, either directly through
zef or using the ModuleCitation module to generate a visual/interactive display similar to the Raku Ecosystem Citation Index:
This sort of info would let us find modules that are small and widely depended on; in short, ones that are perfect candidates for adding to
Given that we can do this, I’d like to put some thought into whether we should and, if so, how to best do so. I can see a few options: we could look for modules with a high citation index that are also good targets for re-writing (perhaps because they were written some time ago or with a different goal) and create sub-packages based on those (without forking them). Or we could look for packages that might be abandoned and, if so, fork them as sub-packages. Or try to work with package maintainers to have them add (a version of) the package to
_. And I’m sure there are other approaches too; something else to ponder.
Finally, I’ve been musing about how
_ can be as trustworthy as possible (even before the subject came up last time). The goal, of course, is for
_ to be as close as possible to zero trust: because each sub-package is a single short, readable (I hope!) file with zero dependencies, you shouldn’t need to trust me – just read the code (and tests) and see for yourself.
That’s a fine theory, but in practice there’s still a big difference between “as close as possible to zero trust” and “actually zero trust”. And it’s true that at least some aspects of
_ depend on its maintainers (i.e., right now, me) being trustworthy. That’s great if you trust me – which I’m kind of stuck with anyway! For the not-me people in the world, I hope earned the trust of many in the Raku community, but I can fully understand anyone who doesn’t share that trust, and I’d like to put some thought into the best ways to add some additional safeguards (both against malicious code and against insecure/buggy code).
In any event that’s definitely a someday-well-after-1.0.0 question – after all, maybe no one else will find
_ useful, and I’ll be its only user. If so, being trusted won’t be an issue at all.
_ and the Unix philosophy
I want to close with a few bigger picture thoughts about
_ and its relationship to the values that (imo) contribute to well-designed software. One comment I got on the first post in this series was “I’m all for left-pad-sized packages”. I think that was meant as a point of disagreement, but my immediate internal response was “me too!”
I love left-pad-sized packages; if I didn’t, I’d hardly have written a seven of them for
_’s initial release. This post has said something about “reducing the number of micro packages” so often that it’d be easy to forget, so I want to be perfectly clear: I think micro packages are great and that we should have more of them. If you’re considering writing a micro package, please do!
What I don’t like is having hundreds or thousands of dependencies. I especially don’t like having a reasonable number of direct dependencies but still having hundreds of transitive dependencies many layers deep. My goal with
_ is to address that problem: I want to have my cake and eat it too.
I want each of my dependencies to be small and also to have as few transitive dependencies as possible. And I believe that
_ (if successful) can help us all to get both. For example, if three of my dependencies need to wrap text and each uses a different micro package, then I’ve just picked up three new dependencies; but if they all use
_, then I’ve only picked up one (or even zero, if one of them already used
_). This sort of thing – where different dependencies import similar-but-different packages to perform the same task – happens all the time in many language ecosystems and is a major contributor to dependency bloat.
In all these ways, I hope that
_ can flatten dependency trees and help us have both micro packages and fewer dependencies.
This dual goal of minimizing dependency size and dependency number also ties into a point that came up in an interesting and thoughtful set of reactions to my previous post. Paraphrasing a bit, the overall critique was that I’d misunderstood the Unix philosophy and that a correct understanding of that philosophy wouldn’t lead to massive dependency graphs or any of the other problems that I described as coming from following the Unix philosophy too far.
In some ways, this is a semantic disagreement: by “Unix philosophy”, do we mean the nuanced but not fully consistent set of practices that emerged in the early days of UNIX? Or do we mean the simplified version that most people mean when they refer to “the Unix philosophy” today? I’m not interested in debating the definitions of our terms but, to be clear, when I say that following “the Unix philosophy” leads to micro-package multiplicity, I’m using the phrase in its more contemporary, simplified sense – a.k.a., the way “many [people have] misapplied ‘The Unix Philosophy’ [as] justif[ying] ‘micro-packages’, when it really doesn’t”, according to at least one commenter (a friend and fellow Rakoon).
So it may well be that the True Unix Philosophy™ wouldn’t lead to programs with 1,000+ dependencies. I view
_ as striking a balance between the Unix philosophy’s push towards micro packages and my simultaneous desire to keep my code’s dependency count in the double digits. But it’s fine if view the Unix philosophy differently and say that (correctly understood) it doesn’t encourage micro packages in the first place. From that point of view,
_ could be about correctly (albeit still partially) applying the Unix philosophy by encouraging shallower and narrower dependency trees.
I’m happy with either framing; either way is a path towards simpler, more reliable, and more composable software. I hope that, by reducing abstraction, having code that’s “correct by inspection”, and providing a coordination point for small-but-useful sub-packages,
_ can play its small part in making that happen in the Raku ecosystem. And, regardless of how well
_ fares, I hope that other languages embrace the use of utility packages and the role they can play in reducing the depth and breadth of dependency trees.