Featured

The 2022 Raku Advent Posts

(in chronological order, with comment references)

Featured

All the blog posts of 2021

Featured

It’s that time of the year

When we start all over again with advent calendars, publishing one article a day until Christmas. This is going to be the first full year with Raku being called Raku, and the second year we have moved to this new site. However, it’s going to be the 12th year (after this first article) in a row with a Perl 6 or Raku calendar, previously published in the Perl 6 Advent Calendar blog. And also the 5th year since the Christmas release, which was announced in the advent calendar of that year.

Anyway. Here we go again! We have lined a up a full (or eventually full by the time the Advent Calendar is finished) set of articles on many different topics, but all of them about our beloved Raku.

So, enjoy, stay healthy, and have -Ofun reading this nice list of articles this year will be bringing.

Day 25: Rakudo 2022 Review

In a year as eventful as 2022 was in the real world, it is a good idea to look back to see what one might have missed while life was messing with your (Raku) plans.

Rakudo saw about 1500 commits this year, about the same as the year before that. Many of these were bug fixes and performance improvements, which you would normally not notice. But there were also commits that actually added features to the Raku Programming Language. So it feels like a good idea to actually mention those more in depth.

So here goes! Unless otherwise noted, all of these changes are in language level 6.d, and available thanks to several Rakudo compiler releases during 2022.

New REPL functionality

It is now possible to refer to values that were produced earlier, using the $*N syntax, where N is a number greater than or equal to 0.

$ raku
To exit type 'exit' or '^D'
[0] > 42
42
[1] > 666
666
[2] > $*0 + $*1
708

Note that the number before the prompt indicates the index with which the value that is going to be produced, can be obtained.

New MAIN options

You can now affect the interpretation of command line arguments to MAIN by setting these options in the %*SUB-MAIN-OPTS hash:

allow-no

Allow negation of a named argument to be specified as --no-foo instead of --/foo.

numeric-suffix-as-value

Allow specification of a numeric value together with the name of a single letter named argument. So -j2 being the equivalent of --j=2.

So for example, by putting:

my %*SUB-MAIN-OPTS = :allow-no, :numeric-suffix-as-value;

at the top of your script, you would enable these features in the command-line argument parsing.

New types

Native unsigned integers (both in scalar, as well as a (shaped) array) have finally become first class citizens. This means that a native unsigned integer can now hold the value 18446744073709551615 as the largest positive value, from 9223372036854775807 before. This also allowed for a number of internal optimisations as the check for negative values could be removed. As simple as this sounds, this was quite an undertaking to get support for this on all VM backends.

my uint  $foo = 42;
my uint8 $bar = 255;
my  int8 $baz = 255;

say $foo; # 42
say $bar; # 255
say $baz; # -1

say ++$foo; # 43
say ++$bar; # 0
say ++$baz; # 0

And yes, all of the other explicitly sized types, such as uint16uint32 and uint64, are now also supported!

New subroutines

A number of subroutines entered the global namespace this year. Please note that they will not interfere with any subroutines in your code with the same name, as these will always take precedence.

NYI()

The NYI subroutine takes a string to indicate a feature not yet implemented, and turns that into a Failure with the X::NYI exception at its core. You could consider this short for ... with feedback, rather than just the “Stub code executed”.

say NYI "Frobnication";
# Frobnication not yet implemented. Sorry.

chown()

The chown subroutine takes zero or more filenames, and changes the UID (with the :uid argument) and/or the GID (with the :gid argument) if possible. Returns the filenames that were successfully changed. There is also a IO::Path.chown method version.

my @files  = ...;
my $uid    = +$*USER;
my changed = chown @files, :$uid;
say "Converted UID of $changed / @files.elems() files";

Also available as a method on IO::Path, but then only applicable to a single path.

head(), skip(), tail()

The .head.skip and .tail methods got their subroutine counterparts.

say head 3, ^10; # (0 1 2)
say skip 3, ^10; # (3,4,5,6,7,8,9)
say tail 3, ^10; # (7 8 9)

Note that the number of elements is always the first positional argument.

New methods

Any.are

The .are method returns the type object that all of the values of the invocant have in common. This can be either a class or a role.

say (1, 42e0, .137).are;        # (Real)
say (1, 42e0, .137, "foo").are; # (Cool)
say (42, DateTime.now).are;     # (Any)

In some languages this functionality appears to be called infer, but this name was deemed to be too ComputerSciency for Raku.

IO::Path.inode|dev|devtype|created|chown

Some low level IO features were added to the IO::Path class, in the form of 5 new methods. Note that they may not actually work on your OS and/or filesystem. Looking at you there, Windows 🙂

  • .inode – the inode of the path (if available)
  • .dev – the device number of the filesystem (if available)
  • .devtype – the device identifier of the filesystem (if available)
  • .created – DateTime object when path got created (if available)
  • .chown – change uid and/or gid of path (if possible, method version of chown())

(Date|DateTime).days-in-year

The Date and DateTime classes already provide many powerfule date and time manipulation features. But a few features were considered missing this year, and so they were added.

A new .days-in-year class method was added to the Date and DateTime classes. It takes a year as positional argument:

say Date.days-in-year(2023);  # 365
say Date.days-in-year(2024);  # 366

This behaviour was also expanded to the .days-in-month method, when called as a class method:

say Date.days-in-month(2023, 2);  # 28
say Date.days-in-month(2024, 2);  # 29

They can also be called as instance methods, in which case the parameters default to the associated values in the object:

given Date.today {
    .say;                # 2022-12-25
    say .days-in-year;   # 365
    say .days-in-month;  # 31
}

New Dynamic Variables

Dynamic variables provide a very powerful way to keep “global” variables. A number of them are provided by the Raku Programming Language. And now there is one more of them!

$*RAT-OVERFLOW

Determine the behaviour of rational numbers (aka Rats) if they run out of precision. More specifically when the denominator no longer fits in a native 64-bit integer. By default, Rats will be downgraded to floating point values (aka Nums). By setting the $*RAT-OVERFLOW dynamic variable, you can influence this behaviour.

The $*RAT-OVERFLOW dynamic variable is expected to contain a class (or an object) on which an UPGRADE-RAT method will be called. This method is expected to take the numerator and denominator as positional arguments, and is expected to return whatever representation one wants for the given arguments.

The following type objects can be specified using core features:

Num

Default. Silently convert to floating point. Sacrifies precision for speed.

CX::Warn

Downgrade to floating point, but issue a warning. Sacrifies precision for speed.

FatRat

Silently upgrade to FatRat, aka rational numbers with arbitrary precision. Sacrifies speed by conserving precision.

Failure

Return an appropriate Failure object, rather than doing a conversion. This will most likely throw an exception unless specifically handled.

Exception

Throw an appropriate exception.

Note that you can introduce any custom behaviour by creating a class with an UPGRADE-RAT method in it, and setting that class in the $*RAT-OVERFLOW dynamic variable.

class Meh {
    method UPGRADE-RAT($num, $denom) is hidden-from-backtrace {
        die "$num / $denom is meh"
    }
}
my $*RAT-OVERFLOW = Meh;
my $a = 1 / 0xffffffffffffffff;
say $a;     # 0.000000000000000000054
say $a / 2; # 1 / 36893488147419103230 is meh

Note that the is hidden-from-backtrace is only added so that any backtrace will show the location of where the offending calculation was done, rather than inside the UPGRADE-RAT method itself.

New Environment Variables

Quite a few environment variables are already checked by Rakudo whenever it starts. Two more were added in the past year:

RAKUDO_MAX_THREADS

This environment variable can be set to indicate the maximum number of OS-threads that Rakudo may use for its thread pool. The default is 64, or the number of CPU-cores times 8, whichever is larger. Apart from a numerical value, you can also specify "Inf” or "unlimited" to indicate that Rakudo should use as many OS-threads as it can.

These same values can also be used in a call to ThreadPoolScheduler.new with the :max_threads named argument.

my $*SCHEDULER =
  ThreadPoolScheduler.new(:max_threads<unlimited>);

INSIDE_EMACS

This environment variable can be set to a true value if you do not want the REPL to check for installed modules to handle editing of lines. When set, it will fallback to the behaviour as if none of the supported line editing modules are installed. This appears to be handy for Emacs users, as the name implies 🙂

New experimental features

Some Raku features are not yet cast in stone yet, so there’s no guarantee that any code written by using these experimental features, will continue to work in the future. Two new experimental features have been added in the past year:

:will-complain

If you add a use experimental :will-complain to your code, you can customize typecheck errors by specifying a will complain trait. The trait expects a Callable that will be given the offending value in question, and is expected to return a string to be added to the error message. For example:

use experimental :will-complain;
my Int $a will complain { "You cannot use -$_-, dummy!" }
$a = "foo";
# Type check failed in assignment to $a; You cannot use -foo-, dummy!

The will complain trait can be used anywhere you can specify a type constraint in Raku, so that includes parameters and attributes.

:rakuast

The RakuAST classes allow you to dynamically build an AST (Abstract Syntax Tree programmatically, and have that converted to executable code. What was previously only possible by programmatically creating a piece of Raku source code (with all of its escaping issues), and then calling EVAL on it. But RakuAST not only allows you to build code programmatically (as seen in yesterday’s blog post), it also allows you to introspect the AST, which opens up all sorts of syntax / lintifying possibilities.

There is an associated effort to compile the Raku core itself using a grammar that uses RakuAST to build executable code. This effort is now capable of passing 585/1355 test-files in roast completely, and 83/131 of the Rakudo test-files completely. So still a lot of work to do, although it has now gotten to the point that implementation of a single Raku feature in the new grammar, often creates an avalanche of now passing test-files.

So, if you add a use experimental :rakuast to your code, you will be able to use all of the currently available RakuAST classes to build code programmatically. This is an entire new area of Raku development, which will be covered by many blog posts in the coming year. As of now, there is only some internal documentation.

A small example, showing how to build the expression "foo" ~ "bar":

use experimental :rakuast;

my $left  = RakuAST::StrLiteral.new("foo");
my $infix = RakuAST::Infix.new("~");
my $right = RakuAST::StrLiteral.new("bar");

my $ast = RakuAST::ApplyInfix.new(:$left, :$infix, :$right);
dd $ast;  # "foo" ~ "bar"

This is very verbose, agreed. Syntactic sugar for making this easier will certainly be developed, either in core or in module space.

Note how each element of the expression can be created separately, and then combined together. And that you can call dd to show the associated Raku source code (handy when debugging your ASTs).

For the very curious, you can check out a proof-of-concept of the use of RakuAST classes in the Rakudo core in the Formatter class, that builds executable code out of an sprintf format.

New arguments to existing functionality

roundrobin(…, :slip)

The roundrobin subroutine now also accepts a :slip named argument. When specified, it will produce all values as a single, flattened list.

say roundrobin (1,2,3), <a b c>;        # ((1 a) (2 b) (3 c))
say roundrobin (1,2,3), <a b c>, :slip; # (1 a 2 b 3 c)

This is functionally equivalent to:

say roundrobin((1,2,3), <a b c>).map: *.Slip;

but many times more efficient.

Cool.chomp($needle)

The .chomp method by default any logical newline from the end of a string. It is now possible to specify a specific needle as a positional argument: only when that is equal to the end of the string, will it be removed.

say "foobar".chomp("foo"); # foobar
say "foobar".chomp("bar"); # foo

It actually works on all Cool values, but the return value will always be a string:

say 427.chomp(7); # 42

DateTime.posix

DateTime value has better than millisecond precision. Yet, the .posix method always returned an integer value. Now it can also return a Num with the fractional part of the second by specifying the :real named argument.

given DateTime.now {
    say .posix;        # 1671733988
    say .posix(:real); # 1671733988.4723697
}

Additional meaning to existing arguments

Day from end of month

The day parameter to Date.new and DateTime.new (whether named or positional) can now be specified as either a Whatever to indicate the last day of the month, or as a Callable indicating number of days from the end of the month.

say Date.new(2022,12,*);   # 2022-12-31
say Date.new(2022,12,*-6); # 2022-12-25

Additions in v6.e.PREVIEW

You can already access new v6.e language features by specifying use v6.e.PREVIEW at the top of your compilation unit. Several additions were made the past year!

term nano

nano term is now available. It returns the number of nanoseconds since midnight UTC on 1 January 1970. It is similar to the time term but one billion times more accurate. It is intended for very accurate timekeeping / logging.

use v6.e.PREVIEW;
say time; # 1671801948
say nano; # 1671801948827918628

With current 64-bit native unsigned integer precision, this should roughly be enough for another 700 years 🙂

prefix //

You can now use // as a prefix as well as an infix. It will return whatever the .defined method returns on the given argument).

use v6.e PREVIEW;
my $foo;
say //$foo; # False
$foo = 42;
say //$foo; # True

Basically //$foo is syntactic sugar for $foo.defined.

snip() and Any.snip

The new snip subroutine and method allows one to cut up a list into sublists according the given specification. The specification consists of one or more smartmatch targets. Each value of the list will be smartmatched with the given target: as soon as it returns False, will all the values before that be produced as a List.

use v6.e.PREVIEW;
say (2,5,13,9,6,20).snip(* < 10);
# ((2 5) (13 9 6 20))

Multiple targets can also be specified.

say (2,5,13,9,6,20).snip(* < 10, * < 20);
# ((2 5) (13 9 6) (20))

The argument can also be an Iterable. To split a list consisting of integers and strings into sublists of just integers and just strings, you can do:

say (2,"a","b",5,8,"c").snip(|(Int,Str) xx *);
# ((2) (a b) (5 8) (c))

Inspired by Haskell’s span function.

Any.snitch

The new .snitch method is a debugging tool that will show its invocant with note by default, and return the invocant. So you can insert a .snitch in a sequence of method calls and see what’s happening “half-way” as it were.

$ raku -e 'use v6.e.PREVIEW;\
say (^10).snitch.map(* + 1).snitch.map(* * 2)'
^10
(1 2 3 4 5 6 7 8 9 10)
(2 4 6 8 10 12 14 16 18 20)

You can also insert your own “reporter” in there: the .snitch method takes a Callable. An easy example of this, is using dd for snitching:

$ raku -e 'use v6.e.PREVIEW;\
say (^10).snitch(&dd).map(*+1).snitch(&dd).map(* * 2)'
^10
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).Seq
(2 4 6 8 10 12 14 16 18 20)

Any.skip(produce,skip,…)

You can now specify more than one argument to the .skip method. Before, you could only specify a single (optional) argument.

my @a = <a b c d e f g h i j>;
say @a.skip;       # (b c d e f g h i j)
say @a.skip(3);    # (d e f g h i j)
say @a.skip(*-3);  # (h i j)

On v6.e.PREVIEW, you can now specify any number of arguments in the order: produce, skip, produce, etc. Some examples:

use v6.e.PREVIEW;
my @a = <a b c d e f g h i j>;
# produce 2, skip 5, produce rest
say @a.skip(2, 5);        # (a b h i j)
# produce 0, skip 3, then produce 2, skip rest
say @a.skip(0, 3, 2);     # (d e)
# same, but be explicit about skipping rest
say @a.skip(0, 3, 2, *);  # (d e)

In fact, any Iterable can now be specified as the argument to .skip.

my @b = 3,5;
# produce 3, skip 5, then produce rest
say @a.skip(@b);           # (a b c i j)
# produce 1, then skip 2, repeatedly until the end
say @a.skip(|(1,2) xx *);  # (a d g j)

Cool.comb(Pair)

On v6.e.PREVIEW, the .comb method will also accept a Pair as an argument to give it .rotor_-like capabilities. For instance, to produce trigrams of a string, one can now do:

use v6.e.PREVIEW;
say "foobar".comb(3 => -2);  # (foo oob oba bar)

This is the functional equivalent of "foobar".comb.rotor(3 => -2)>>.join, but about 10x as fast.

Changed semantics on Int.roll|pick

To pick a number from 0 till N-1, one no longer has to specify a range, but can use just the integer value as the invocant:

use v6.e.PREVIEW;
say (^10).roll;     # 5
say 10.roll;        # 7
say (^10).pick(*);  # (2 0 6 9 4 1 5 7 8 3)
say 10.pick(*);     # (4 6 1 0 2 9 8 3 5 7)

Of course, all of these values are examples, as each run will, most likely, produce different results.

More interesting stuff

There were some more new things and changes the past year. I’ll just mention them very succinctly here:

New methods on CompUnit::Repository::Staging

.deploy.remove-artifacts, and .self-destruct.

:!precompile flag on CompUnit::Repository::Installation.install

Install module but precompile on first loading rather than at installation.

New methods on Label

.file and .line where the Label was created.

.Failure coercer

Convert a Cool object or an Exception to a Failure. Mainly intended to reduce binary size of hot paths that do some error checking.

Cool.Order coercer

Coerce the given value to an Int, then convert to Less if less than 0, to Same if 0, and More if more than 0.

Allow semi-colon

Now allow for the semi-colon in my :($a,$b) = 42,666 because the left-hand side is really a Signature rather than a List.

Summary

I guess we’ve seen one big change in the past year, namely having experimental support for RakuAST become available. And many smaller goodies and tweaks and features.

Now that RakuAST has become “mainstream” as it were, we can think of having certain optimizations. Such as making sprintf with a fixed format string about 30x as fast! Exciting times ahead!

Hopefully you will all be able to enjoy the Holiday Season with sufficient R&R. The next Raku Advent Blog is only 340 days away!

Day 24: He’s making a list… (part 2)

In our last edition, we learned about some of the work that Santa’s elves put into automating how they make their lists. What you probably didn’t know is that the elves stay on top of the latest and greatest technology. Being well-known avid Raku programmers, the elves were excited to hear about RakuAST and decided to see how they might be able to use it. One of the elves decided to rework the list formatting code to use RakuAST. What follows is the story of how she upgraded their current technology to use RakuAST.

Background

The current code that the elves had is fairly straight forward (check out part one for a full explanation)

sub format-list(
  +@items,
  :$language 'en',
  :$type = 'and',
  :$length = 'standard'
) {
    state %formatters;
    my $code = "$language/$type/$length";
 	 
    # Get a formatter, generate if it's not been requested before
    my &formatter = %cache{$code} // %cache{$code} =
      generate-list-formatter($language, $type, $length);
 	 
    formatter @items;
}
 	 
sub generate-list-formatter($language, $type, $length --> Sub ) {
    # Get CLDR information
    my $format = cldr{$language}.list-format{$type}{$length};
    my ($start, $middle, $end, $two) =
      $format<start middle end two>.map: *.substr(3,*-3).raku;
 	 
    # Generate code
    my $code = q:s:to/FORMATCODE/;
        sub format-list(+@items) {
            if @items > 2 {
                @items[0]
                  ~ $start
                  ~ @items[1..*-2].join($middle)
                  ~ $end
                  ~ @items[*-1]
            }
            elsif @items == 2 {
                @items[0] ~ $two ~ @items[1]
            }
            elsif @items == 1 {
                @items[0]
            }
            else {
                ''
            }
        }
    FORMATCODE
 	 
    # compile and return
    use MONKEY-SEE-NO-EVAL;
    EVAL $code
}

While the caching technique is rudimentary and technically not thread-safe, it works (a different elf will probably revisit the code to make it so). Now, when creating all the lists for, say, children in Georgia, the data for Georgian list formatters in CLDR will only need to be accessed a single time. For the next half a million or so calls, the code will be run practically as fast as if it had been hard coded (since, in effect, it has been).

The problem is how the generate-list-formatter code works. The code block uses a heredoc-style :to string, but it’s interpolated. There are numerous ways to accomplish this but all of them require having to use proper escapes. That’s…. risky.

Another elf, having seen the performance improvements that this new EVAL code brought, wanted to find a way to avoid the risky string evaluation. She had heard about the new RakuAST and decided to give it a whirl. While it initially looked more daunting, she quickly realized that RakuAST was very powerful.

What is RakuAST

RakuAST is an object-based representation of Raku’s abstract syntax tree, or roughly what you might get if you parsed Raku’s code into its individual elements. For instance, a string literal might be represented as 'foo' in code, but once parsed, becomes a string literal. That string literal, by the way, can be created by using RakuAST::StrLiteral.new(…). Remember how the elf had to worry about how the string might be interpolated? By creating a the string literal directly via a RakuAST node, that whole process is safely bypassed. No RakuAST::StrLiteral node can be created that will result in a string injection!

Every single construct in the Raku language has an associated RakuAST node. When creating nodes, you might frequently pass in another node, which means you can build up code objects in a piece-by-piece fashion, and again, without ever worrying about string interpolation, escaping, or injection attacks.

So let’s see how the elf eventually created the safer RakuAST version of the formatter method.

The elf works her AST off

To ease her transition into RakuAST, the elf decided to go from the simplest to the most complex part of the code. The simplest is the value for the final else block:

my $none = RakuAST::StrLiteral.new(''); 

Okay. That was easy. Now she wanted to tackle the single element value. In the original code, that was @list.head. Although we don’t normally think of it as such, . is a special infix for method calling. Operators can be used creating an RakuAST::Apply___fix node, where ___ is the type of operator. Depending on the node, there are different arguments. In the case of RakuAST::ApplyPostfix, the arguments are operand (the list), and postfix which is the actual operator. These aren’t as simple as typing in some plain text, but when looking at the code the elf came up with, it’s quite clear what’s going on:

my $operand = RakuAST::Var::Lexical.new('@list');
my $postfix = RakuAST::Call::Method.new(
  name => RakuAST::Name.from-identifier('head')
);
my $one = RakuAST::ApplyPostfix.new(:$operand, :$postfix) 

The operand isn’t a literal, but a variable. Specifically, it’s a lexical variable, so we create a node that will reference it. The call method operator needs a name as well, so we do that as well.

This involves a lot of assignment statements. Sometimes that can be helpful, but for something this simple, the elf decided it was easier to write it as one “line”:

my $one = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('head')
  )
);

Alright, so the first two cases are done. How might she create the result for when the list has two items? Almost exactly like the last time, except now she’d provide an argument. While you might think it would be as simple as adding args => RakuAST::StrLiteral($two-infix), it’s actually a tiny bit more complicated because in Raku, argument lists are handled somewhat specially, so we actually need a RakuAST::ArgList node. So the equivalent of @list.join($two-infix) is

my $two = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('join'),
    args => RakuAST::ArgList.new(
      RakuAST::StrLiteral.new($two-infix)
    )
  )
); 	 

The RakuAST::ArgList takes in a list of arguments — be they positional or named (named applied by way of a RakuAST::FatComma).

Finally, the elf decided to tackle what likely would be the most complicated bit: the code for 3 or more items. This code makes multiple method calls (including a chained one), as well as combining everything with a chained infix operator.

The method calls were fairly straightforward, but she thought about what the multiple ~ operators would be handled. As it turns out, it would actually require being set up as if (($a ~ $b) ~ $c) ~ $d, etc., and the elf didn’t really like the idea of having ultimately intending her code that much. She also thought about just using join on a list that she could make, but she already knew how to do method calls, so she thought she’d try something cool: reduction operators (think [~] $a, $b, $c, $d for the previous). This uses the RakuAST::Term::Reduce node that takes a simple list of arguments. For the * - 2 syntax, to avoid getting too crazy, she treated it as if it had been written as the functionally identical @list - 2.

Becaused that reduction bit has some many elements, she ending up breaking things into pieces: the initial item, the special first infix, a merged set of the second to penultimate items joined with the common infix, the special final infix, and the final item. For a list like [1,2,3,4,5] in English, that amounts to 1 (initial item), , (first infix), 2, 3, 4 (second to penultimate, joined with , ), , and (final infix) and 5 (final item). In other languages, the first and repeated infixes may be different, and in others, all three may be identical.

# @list.head
my $more-first-item = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('head')
  )
);

# @list[1, * - 2].join($more-middle-infix)
my $more-mid-items = RakuAST::ApplyPostfix.new(
  # @list[1, @list - 2
  operand => RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Postcircumfix::ArrayIndex.new(
      # (1 .. @list - 2)
      RakuAST::SemiList.new(
        RakuAST::ApplyInfix.new(
          left => RakuAST::IntLiteral.new(1),
          infix => RakuAST::Infix.new('..'),
          # @list - 2
          right => RakuAST::ApplyInfix.new(
            left => RakuAST::Var::Lexical.new('@list'),
            infix => RakuAST::Infix.new('-'),
            right => RakuAST::IntLiteral.new(2)
          )
        )
      )
    )
  ),
  # .join($more-middle-infix)
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('join'),
    args => RakuAST::ArgList.new(
      RakuAST::StrLiteral.new($more-middle-infix)
    )
  )
);
 
# @list.tail
my $more-final-item = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('tail')
  )
);
 	 
# [~] ...
my $more = RakuAST::Term::Reduce.new(
  infix => RakuAST::Infix.new('~'),
  args => RakuAST::ArgList.new(
    $more-first-item,
    RakuAST::StrLiteral.new($more-first-infix),
    $more-mid-items,
    RakuAST::StrLiteral.new($more-final-infix),
    $more-final-item,
  )
);

As one can note, as RakuAST code starts getting more complex, it can be extremely helpful to store interim pieces into variables. For complex programs, some RakuAST users will create functions that do some of the verbose stuff for them. For instance, one might get tired of the code for an infix, and write a sub like

sub rast-infix($left, $infix, $right) {
    RakuAST::ApplyInfix.new:
      left => $left,
      infix => RakuAST::Infix.new($infix),
      right => $right
}

to enable code like rast-infix($value, '+', $value) which ends up being much less bulky. Depending on what they’re doing, they might make a sub just for adding two values, or maybe making a list more compactly.

In any case, the hard working elf had now programmatically defined all of the formatter code. All that was left was for her to piece together the number logic and she’d be done. That logic was, in practice, quite simple:

if @list > 2 { $more }
elsif @list == 2 { $two }
elsif @list == 1 { $one }
else { $none } 

In practice, there was still a bit of a learning curve. Why? As it turns out, the [els]if statements are actually officially expressions, and need to be wrapped up in an expression block. That’s easy enough, she could just use RakuAST::Statement::Expression. Her conditions end up being coded as

# @list > 2
my $more-than-two = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('>'),
    right => RakuAST::IntLiteral.new(2)
  )
);
 	 
# @list == 2
my $exactly-two = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('=='),
    right => RakuAST::IntLiteral.new(2)
  )
);
 	 
# @list == 1
my $exactly-one = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('=='),
    right => RakuAST::IntLiteral.new(1)
  )
);	 

That was simple enough. But now sure realized that the then statements were not just the simple code she had made, but were actually a sort of block! She would need to wrap them with a RakuAST::Block. A block has a required RakuAST::Blockoid element, which in turn has a required RakuAST::Statement::List element, and this in turn will contain a list of statements, the simplest of which is a RakuAST::Statement::Expression that she had already seen. She decided to try out the technique of writing a helper sub to do this:

sub wrap-in-block($expression) {
    RakuAST::Block.new(
      body => RakuAST::Blockoid.new(
        RakuAST::StatementList.new(
          RakuAST::Statement::Expression.new(:$expression)
        )
      )
    )
}
 	 
$more = wrap-in-block $more;
$two  = wrap-in-block $two;
$one  = wrap-in-block $one;
$none = wrap-in-block $none; 

Phew, that was a pretty easy way to handle some otherwise very verbose coding. Who knew Raku hid away so much complex stuff in such simple syntax?! Now that she had both the if and then statements finished, she was ready to finish the full conditional:

my $if = RakuAST::Statement::If.new(
  condition => $more-than-two,
  then => $more,
  elsifs => [
    RakuAST::Statement::Elsif.new(
      condition => $exactly-two,
      then => $two
    ),
    RakuAST::Statement::Elsif.new(
      condition => $exactly-one,
      then => $one
    )
  ],
  else => $none
); 

All that was left was for her to wrap it up into a Routine and she’d be ready to go! She decided to put it into a PointyBlock, since that’s a sort of anonymous function that still takes arguments. Her fully-wrapped code block ended up as:

my $code = RakuAST::PointyBlock.new(
  signature => RakuAST::Signature.new(
    parameters => (
      RakuAST::Parameter.new(
        target => RakuAST::ParameterTarget::Var.new('@list'),
 	slurpy => RakuAST::Parameter::Slurpy::SingleArgument
      ),
    ),
  ),
  body => RakuAST::Blockoid.new(
    RakuAST::StatementList.new(
      RakuAST::Statement::Expression.new(
        expression => $if
      )
    )
  )
); 

Working with RakuAST, she really got a feel for how things worked internally in Raku. It was easy to see that a runnable code block like a pointy block consisted of a signature and a body. That signature had a list of parameters, and the body a list of statements. Seems obvious, but it can be enlightening to see it spread out like she had it.

The final step was for her actually evaluate this (now much safer!) code. For that, nothing changed. In fact, the entire rest of her block was simply

sub generate-list-formatter($language, $type, $length) {
    use Intl::CLDR;
    my $pattern = cldr{$lang}.list-patterns{$type}{$length};
    my $two-infix = $pattern.two.substr: 3, *-3;
    my $more-first-infix = $pattern.start.substr: 3, *-3;
    my $more-middle-infix = $pattern.middle.substr: 3, *-3;
    my $more-final-infix = $pattern.end.substr: 3, *-3;
 	 
    ...
 	 
    use MONKEY-SEE-NO-EVAL;
    EVAL $code
}

Was her code necessarily faster than the older method? Not necessarily. It didn’t require a parse phase, which probably saved a bit, but once compiled, the speed would be the same.

So why would she bother doing all this extra work when some string manipulation could have produced the same result? A number of reasons. To begin, she learned the innards of RakuAST, which helped her learn the innards of Raku a bit better. But for us non-elf programmers, RakuAST is important for many other reasons. For instance, at every stage of this process, everything was fully introspectable! If your mind jumped to writing optimizers, besides being a coding masochist, you’ve actually thought about something that will likely come about.

Macros is another big feature that’s coming in Raku and will rely heavily on RakuAST. Rather than just do text replacement in the code like macros in many other languages, macros will run off of RakuAST nodes. This means an errant quote will never cause problems, and likely enable far more complex macro development. DSL developers can seamlessly integrate with Raku by just compiling down to RakuAST.

The future

So what is the status of RakuAST? When can you use it? As of today, you will need to build the most recent main branch of Rakudo to use it. Then, in your code, include the statement use experimental :rakuast;. Yours truly will be updating a number of his formatting modules to use RakuAST very shortly which will make them far more maintainable and thus easier to add new features. For more updates on the progress of RakuAST, check out the Rakudo Weekly, where Elizabeth Mattijsen gives regular updates on RakuAST and all things Raku.

Day 23: Sigils followup: semantics and language design

Until a few days ago, I’d intended for this post to be an update on the Raku persistent data structures I’m developing. And I have included a (very brief) status update at the end of this post. But something more pressing has come to my attention: Someone on the Internet was wrong — and that someone was me.

xkcd_386

Specifically, in my post about sigils the other day, I significantly misdescribed the semantics that Raku applies to sigiled-variables.

Considering that the post was about sigils, the final third focused on Raku’s sigils, and much of that section discussed the semantics of those sigils – being wrong about the semantics of Raku’s sigils isn’t exactly a trivial mistake. Oops!

In partial mitigation, I’ll mention one thing: no one pointed out my incorrect description of the relevant semantics, even though the post generated over two hundred comments of discussion, most of it thoughtful. Now, it could be no one read all the way to Part 3 of a 7,000 word post (an understandable choice!). But, considering the well-known popularity of correcting people on the Internet, I view the lack of any correction as some evidence that my misunderstanding wasn’t obvious to others either. In fact, I only discovered the issue when I decided, while replying to a comment on that post, to write an an oddly-designed Raku class to illustrate the semantics I’d described; much to my suprise, it showed that I’d gotten those semantics wrong.

Clearly, that calls for a followup post, which you’re now reading.

My goal for this post is, first of all, to explain what I got wrong about Raku’s semantics, how I made that error, and why neither I nor anyone else noticed. Then we’ll turn to some broader lessons about language design, both in Raku and in programming languages generally.  Finally, with the benefit of correctly understanding of Raku’s semantics, we’ll reevaluate Raku’s sigils, and the expressive power they provide.

What I got wrong – and what I got right

In that post, I said that the @ sigil can only be used for types that implement the Positional (“array-like”) role; that the % sigil can only be used for types that implement the Associative (“hash-like”) role; and that the & sigil can only be used for types that implement the Callable (“function-like”) role. All of that is right (and pretty much straight from the language docs).

Where I went wrong was when I described the requirements that a type must satisfy in order to implement those roles. I described the Positional role as requiring an iterable, ordered collection that can be indexed positionally (e.g., with @foo[5]); I described the Associative role as requiring an iterable, unordered collection of Pairs that can be indexed associatively (e.g., with %foo<key>); and I described the Callable role as requiring a type to support being called as a function (e.g., with &foo()).

That, however, was an overstatement. The requirements for implementing those three roles are actually: absolutely nothing. That’s right, they’re entirely “marker roles”, the Raku equivalent of Rust’s marker traits.

Oh sure, the Raku docs provide lists of methods that you should implement, but those are just suggestions. There’s absolutely nothing stopping us from writing classes that are Associative, Positional, or Callable, or – why not? – all three if we want to. Or, for that matter, since Raku supports runtime composition, the following is perfectly valid:

  my @pos := 'foo' but Positional;
  my %asc := 90000 but Associative;
  my &cal := False but Callable;

Yep, we can have a Positional string, an Associative number, and a Callable

How did we miss that?

So, here’s the thing: I’ve written quite a bit of Raku code while operating under the mistaken belief that those roles had the type constraints I described – which are quite a bit stricter than “none at all”. And I don’t think I’m alone in that; in fact, the most frequent comment I got on the previous post was surprise/confusion that @ and % weren’t constrained to concrete Arrays and Hashes (a sentiment I’ve heard before). And I don’t think any of us were crazy to think those sorts of things – when you first start out in Raku, the vast majority (maybe all) of the @– and %-sigiled things you see are Arrays and Hashes. And I don’t believe I’ve ever seen an @-sigiled variable in Raku that wasn’t an ordered collection of some sort. So maybe people thinking that the type constraints are stricter makes a certain amount of sense.

But that, in turn, just raises two more questions: First, given the unconstrained nature of those sigils, why haven’t I seen some Positional strings in the wild? After all, relying on programmer discipline instead of tool-enforcement is usually a recipe for quick and painful disaster. And, second, given that @– and %

Good defaults > programmer discipline

Let’s address those questions in order: Why haven’t I seen @-sigiled strings or %-sigiled numbers? Because Raku isn’t relying on programmer discipline to prevent those things; it’s relying on programmer laziness – a much stronger force. Writing my @pos := 'foo' but Positional seems very easy, but it has three different elements that would dissuade a new Rakoon from writing it: the := bind operator (most programmers are more familiar with assignment, and = is overwhelmingly more common in Raku code examples); the but operator (runtime composition is relatively uncommon in the wider programming world, and it’s not a tool Raku code turns to all that often) and Positional (roles in general aren’t really a Raku 101 topic, and Positional/Associative/Callable even less so – after all, all the built-in types that should implement those roles already do so).

Let’s contrast that line with the version that a new Rakoon would be more likely to write – indeed, the version that every Rakoon must have written over and over: my @pos = 'foo'. That removes all three of the syntactic stumbling blocks from the preceding code. More importantly, it works. Because the @-sigil provides a default Array container, that line creates the Array ['foo'] – which is much more likely to be what the user wanted in the first place.

Of course, that’s just one example, but the general pattern holds: Raku very rarely prohibits users from doing something (even something as bone-headed as a Positional string) but it’s simultaneously good at making the default/easiest path one that avoids those issues. If there’s an easy-but-less-rigorous option available, then no amount of “programmer discipline” will prevent everyone from taking it. But when the safer/saner thing is also by far the easier thing, then we’re not relying on programmer discipline. We’re removing the temptation entirely.

And then by the time someone has written enough Raku that :=, but, and Positional wouldn’t give them any pause, they probably have the “@ means “array-like, but maybe not an Array” concept so deeply ingrained that they wouldn’t consider creating a wacky Positional

Being stricter

What about the second question we posed earlier: Why doesn’t Raku enforce a tighter type constraint? It certainly could: Raku has the language machinery to really tighten down the requirements for a role. It would be straightforward to mandate that any type implementing the Positional role must also implement the methods for positional indexing. And, since Raku already has an Iterable role, requiring Positional types to be iterable would also be trivial. So why not?

Well, because – even if the vast majority of Positional types should allow indexing and should be iterable, there will be some that have good reasons not to be. And Raku could turn the “why not?” question around and ask “why?”

Providing guarantees versus communicating intent

All of this brings a question into focus – a question that goes right to the heart of Raku’s design philosophy and is an important one for any language designer to consider.

That question is: Is your language more interested in providing guarantees or in communicating intent

Guarantees are great

When I’m not writing Raku (or long blog posts), the programming language I spend the most time with is Rust. And Rust is very firmly on the providing guarantees side of that issue. And it’s genuinely great. There’s something just absolutely incredible and freeing about having the Rust compiler and a strong static type system at your back, of knowing that you just absolutely, 100% don’t need to worry about certain categories of bugs or errors. With that guarantee, you can drop those considerations from your mental cache altogether (you know, to free up space for the things that are cognitively complex in Rust – which isn’t a tiny list). So, yes, I saw the appeal when primarily writing Rust and I see it again every time I return to the language.

Indeed, I think Rust’s guarantees are 100% the right choice – for Rust. I believe that the strength of those guarantees was a great fit for Rust’s original use case (working on Firefox) and are a huge part of why Facebook, Microsoft, Amazon, and Google have all embraced Rust: when you’re collaborating on a team with the scope of a huge open-source project or a big tech company, guarantees become even more valuable. When some people leave, new ones join, and there’s no longer a way to get everyone on the same page, it’s great to have a language that says “you don’t have to trust their code, just trust me”.

But the thing about guarantees is that they have to be absolute. If something is “90% guaranteed”, then it’s not

Coding as a collaborative, asynchronous communication

Guarantees-versus-communication is one trade off where Raku makes the other choice, in a big way. Raku is vastly more interested in helping programmers to communicate their intent than in enforcing rules strictly enough to make guarantees. If Rust’s fundamental metaphor for code is the deductive proof – each step depends on the correctness of the previous ones, so we’d better be as sure as possible that they’re right – Raku’s fundamental metaphor is, unsurprisingly, more linguistic. Raku’s metaphor for coding is an asynchronous conversation between friends: an email exchange, maybe, or — better yet – a series of letters.

How is writing code like emailing a friend? Well, we talked last time about the three-way conversation between author, reader, and compiler, but that’s a bit of a simplification. Most of the time, we’re simultaneously reading previously-written code and writing additional code, which turns the three-way conversation into a four-way one. True, the “previous author”, “current reader/author”, and “future reader” might all be you, but the fact that you’re talking to yourself doesn’t make it any less of a conversation: either way, the goal is to understand the previous author’s meaning as well as possible, decide what you want to add to the conversation, and then express yourself as clearly as possible – subject to the constraint that the compiler also needs to understand your code.

A few words on that last point. From inside a code-as-proof metaphor, a strict compiler is a clear win. Being confident in the correctness of anything is hard enough, but it’s vastly harder as you increase the possibility space. But from a code-as-communication metaphor, there’s a real drawback to compilers (or formatters) that limit your ability to say the same thing in multiple ways. What shirt you wear stops being an expressive choice if you’re required to wear a uniform. In the same way, when there’s exactly one way to do something, then doing it that way doesn’t communicate anything. But when there’s more than one way to do it, then suddenly it makes sense to ask, “Okay, but why did they do it that way?”. This is deeply evident in Raku: there are multiple ways to write code that does the same thing, but those different ways don’t say the same thing – they allow you to place the emphasis in different points, depending on where you’d like to draw the reader’s attention. Raku’s large “vocabulary” plays the same role as increasing your vocabulary in a natural language: it makes it easier to pick just the right word.

When code is communication, rules become suggestions

When emailing a friend, neither of you can set “rules” that the other person must follow. You can make an argument for why they shouldn’t do something, you can express clearly and unequivocally that doing that would be a mistake, but you can’t stop them. You are friends – equals – and neither the email’s author nor its reader can overrule the other.

And the same is true of Raku: Raku makes it very difficult (frequently impossible) for the author of some code to 100% prevent someone from using their code in a particular way. Raku provides many ways to express – with all the intensity of an ALL CAPS EMAIL – that doing something is a really, really bad idea. But if you are determined to misuse code and knowledgeable enough, there’s pretty much no stopping you.

Coming from Rust, this took me a while to notice, because (at least in intro materials) Raku presents certain things as absolute rules (“private attributes cannot be accessed outside the class!”) when, in reality, they turn out to be strongly worded suggestions (”…unless you’re messing with the Meta Object Protocol in ways that you really shouldn’t”). From a Rust perspective, that just wouldn’t fly – private implementations should be private, But it fits perfectly with Raku’s overall design philosophy.

Communicating through sigils

Applying this design philosophy to sigils, I’ve come around to believing that making Possitional, Associative, and Callable marker roles was entirely the correct choice. After all, marker roles are entirely about communicating through code – even in Rust, the entire purpose of marker traits is to communicate some property that the Rust compiler can’t verify.

This is a perfect fit for sigils. What does @ mean? It means that the variable is Positional. Okay, what does Positional mean? It means “array-like”… Okay. What does “array-like” mean? Well, that’s up to you to decide, as part of the collaborative dialogue (trialogue?) with the past and future authors.

That doesn’t mean you’re on your own, crafting meaning from the void: Raku keeps us on the same general page by ensuring that every Rakoon has extensive experience with Arrays, which creates a shared understanding for what “array-like” means. And the language documentation provides clear explanations of how to make your custom types behave like Raku’s Array. But – as I now realize – Raku isn’t going to stomp its foot and say that @-sigiled variables must behave a particular way. If it makes sense – in your code base, in the context of your multilateral conversation – to have an @-sigiled variable that is neither ordered nor iterable, then you can.

So, I’m disappointed that I was mistaken about Raku’s syntax when I wrote my previous post. And I’m especially sorry if anyone was confused by the uncorrected version of that post. But I’m really glad to realize Raku’s actual semantics for sigils, because it fits perfectly with Raku as a whole.  Moreover, these semantics not only fit better with Raku’s design, they make Raku’s sigil’s even more better-suited for their primary purpose: helping someone writing code to clearly and concisely communicate their intent to someone reading that code

In keeping with my earlier post, I’ll include a table with the semantics of the three sigils we discussed:

Sigil Meaning
@ Someone intentionally marked the variable Positional
% Someone intentionally marked the variable Associative
& Someone intentionally marked the variable Callable

These semantics are perfect because, in the end, that’s what @, %, &, and $ really are: signs of what someone else intended. Little, semantically dense, magic signs.

Day 22: He’s making a list… (part 1)

If there’s anything that Santa and his elves ought to know, it’s how to make a list. After all, they’re reading lists that children send in, and Santa maintains his very famous list. Another thing we know is that Santa and his elves are quite multilingual.

So one day one of the elfs decided that, rather than hand typing out a list of gifts based on the data they received (requiring elves that spoke all the world’s languages), they’d take advantage of the power of Unicode’s CLDR (Common Linguistic Data Repository). This is Unicode’s lesser-known project. As luck would have it, Raku has a module providing access to the data, called Intl::CLDR. One elf decided that he could probably use some of the data in it to automate their list formatting.

He began by installing Intl::CLDR and played around with it in the terminal. The module was designed to allow some degree of exploration in a REPL, so the elf did the following after reading the provided read me:

# Repl response
use Intl::CLDR; # Nil
my $english = cldr<en> # [CLDR::Language: characters,context-transforms,
# dates,delimiters,grammar,layout,list-patterns,
# locale-display-names,numbers,posix,units]

The module loaded up the data for English and the object returned had a neat gist that provides information about the elements it contains. For a variety of reasons, Intl::CLDR objects can be referenced either as attributes or as keys. Most of the time, the attribute reference is faster in performance, but the key reference is more flexible (because let’s be honest, $english{$foo} looks nicer than $english."$foo"(), and it also enables listy assignment via e.g. $english<grammar numbers>).

In any case, the elf saw that one of the data points is list-patterns, so he explored further:

# Repl response
$english.list-patterns; # [CLDR::ListPatterns: and,or,unit]
$english.list-patterns.and; # [CLDR::ListPattern: narrow,short,standard]
$english.list-patterns.standard; # [CLDR::ListPatternWidth: end,middle,start,two]
$english.list-patterns.standard.start; # {0}, {1}
$english.list-patterns.standard.middle; # {0}, {1}
$english.list-patterns.standard.end; # {0}, and {1}
$english.list-patterns.standard.two; # {0} and {1}

Aha! He found the data he needed.

List patterns are catalogued by their function (and-ing them, or-ing them, and a unit one designed for formatting conjoined units such as 2ft 1in or similar). Each pattern has three different lengths. Standard is what one would use most of the time, but if space is a concern, some languages might allow for even slimmer formatting. Lastly, each of those widths has four forms. The two form combines, well, two elements. The other three are used to collectively join three or more: start combines the first and second element, end combines the penultimate and final element, and middle combines all second to penultimate elements.

He then wondered what this might look like for other languages. Thankfully, testing this out in the repl was easy enough:

my &and-pattern = { cldr{$^language}.list-patterns-standard<start middle end two>.join: "\t"'" }
# Repl response (RTL corrected, s/\t/' '+/)
and-pattern 'es' # {0}, {1} {0}, {1} {0} y {1} {0} y {1}
and-pattern 'ar' # ‮{0} و{1} {0} و{1} {0} و{1} {0} و{1}
and-pattern 'ko' # {0}, {1} {0}, {1} {0} 및 {1} {0} 및 {1}
and-pattern 'my' # {0} - {1} {0} - {1} {0}နှင့် {1} {0}နှင့် {1}
and-pattern 'th' # {0} {1} {0} {1} {0} และ{1} {0}และ{1}

He quickly saw that there was quite a bit of variation! Thank goodness someone else had already catalogued all of this for him. So he went about trying to create a simple formatting routine. To begin, he created a very detailed signature and then imported the modules he’d need.

#| Lengths for list format. Valid values are 'standard', 'short', and 'narrow'.
subset ListFormatLength of Str where <standard short narrow>;
#| Lengths for list format. Valid values are 'and', 'or', and 'unit'.
subset ListFormatType of Str where <standard short narrow>;
use User::Language; # obtains default languages for a system
use Intl::LanguageTag; # use standardized language tags
use Intl::CLDR; # accesses international data
#| Formats a list of items in an internationally-aware manner
sub format-list(
+@items, #= The items to be formatted into a list
LanguageTag() :$language = user-language #= The language to use for formatting
ListFormatLength :$length = 'standard', #= The formatting width
ListFormatType :$type = 'and' #= The type of list to create
) {
...
...
...
}

That’s a bit of a big bite, but it’s worth taking a look at. First, the elf opted to use declarator POD wherever it’s possible. This can really help out people who might want to use his eventual module in an IDE, for autogenerating documentation, or for curious users in the REPL. (If you type in ListFormatLength.WHY, the text “Lengths for list format … and ‘narrow’” will be returned.) For those unaware of declarator POD, you can use either #| to apply a comment to the following symbol declaration (in the example, for the subset and the sub itself), or #= to apply it to the preceeding symbol declaration (most common with attributes).

Next, he imported two modules that will be useful. User::Language detects the system language, and he used it to provide sane defaults. Intl::LanguageTag is one of the most fundamental modules in the international ecosystem. While he wouldn’t strictly need it (we’ll see he’ll ultimately only use them in string-like form), it helps to ensure at least a plausible language tag is passed.

If you’re wondering what the +@items means, it applies a DWIM logic to the positional arguments. If one does format-list @foo, presumably the list is @foo, and so @items will be set to @foo. On the other hand, if someone does format-list $foo, $bar, $xyz, presumably the list isn’t $foo, but all three items. Since the first item isn’t a Positional, Raku assumes that $foo is just the first item and the remaining positional arguments are the rest of the items. The extra () in LanguageTag() means that it will take either a LanguageTag or anything that can be coerced into one (like a string).

Okay, so with that housekeeping stuff out of the way, he got to coding the actual formatting, which is devilishly simple:

my $format = cldr{$language}.list-format{$type}{$length};
my ($start, $middle, $end, $two) = $format<start middle end two>;
if @items > 2 { ... }
elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
elsif @items == 1 { @items.head }
else { '' }

He paused here to check and see if stuff would work. So he ran his script and added in the following tests:

# output
format-list <>, :language<en>; # ''
format-list <a>, :language<en>; # 'a'
format-list <a b>, :language<en>; # 'a{0} and {1}b'

While the simplest two cases were easy, the first one to use CLDR data didn’t work quite as expected. The elf realized he’d need to actually replace the {0} and {1} with the item. While technically he should use subst or similar, after going through the CLDR, he realized that all of them begin with {0} and end with {1}. So he cheated and changed the initial assignment line to

my $format = cldr{$language}.list-format{$type}{$length};
my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3, *-3);

Now he his two-item function worked well. For the three-or-more condition though, he had to think a bit harder how to combine things. There are actually quite a few different ways to do it! The simplest way for him was to take the first item, then the $start combining text, then join the second through penutimate, and then finish off with the $end and final item:

if @items > 2 {
~ $items[0]
~ $start
~ $items[1..*-2].join($middle)
~ $end
~ $items[*-1]
}
elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
elsif @items == 1 { @items.head }
else { '' }

Et voilà! His formatting function was ready for prime-time!

# output
format-list <>, :language<en>; # ''
format-list <a>, :language<en>; # 'a'
format-list <a b>, :language<en>; # 'a and b'
format-list <a b c>, :language<en>; # 'a, b, and c'
format-list <a b c d>, :language<en>; # 'a, b, c, and d'

Perfect! Except for one small problem. When they actually started using this, the computer systems melted some of the snow away because it overheated. Every single time they called the function, the CLDR database needed to be queried and the strings would need to be clipped. The elf had to come up with something to be a slight bit more efficient.

He searched high and wide for a solution, and eventually found himself in the dangerous lands of Here Be Dragons™, otherwise known in Raku as EVAL. He knew that EVAL could potentially be dangerous, but that for his purposes, he could avoid those pitfalls. What he would do is query CLDR just once, and then produce a compilable code block that would do the simple logic based on the number of items in the list. The string values could probably be hard coded, sparing some variable look ups too.

There be dragons here 🐉🦋

EVAL should be used with great caution. All it takes is one errant unescaped string being accepted from an unknown source and your system could be taken. This is why it requires you to affirmatively type use MONKEY-SEE-NO-EVAL in a scope that needs EVAL. However, in situations like this, where we control all inputs going in, things are much safer. In tomorrow’s article, we’ll discuss ways to do this in an even more safer manner, although it adds a small degree of complexity.

Back to the regularly scheduled program

To begin, the elf imagined his formatting function.

sub format-list(+@items) {
if @items > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
elsif @items == 1 { @items[0] }
else { '' }
}

That was … really simple! But he needed this in a string format. One way to do that would be to just use straight string interpolation, but he decided to use Raku’s equivalent of a heredoc, q:to. For those unfamiliar, in Raku, quotation marks are actually just a form of syntactic sugar to enter into the Q (for quoting) sublanguage. Using quotation marks, you only get a few options: ' ' means no escaping except for \\, and using " " means interpolating blocks and $-sigiled variables. If we manually enter the Q-language (using q or Q), we get a LOT more options. If you’re more interested in those, you can check out Elizabeth Mattijsen’s 2014 Advent Calendar post on the topic. Our little elf decided to use the q:s:to option to enable him to keep his code as is, with the exception of having scalar variables interpolated. (The rest of his code only used positional variables, so he didn’t need to escape!)

my $format = cldr{$language}.list-format{$type}{$length};
my ($start, $middle, $end, $two) = $format<start middle end two>;
my $code = q:s:to/FORMATCODE/;
sub format-list(+@items) {
if @items > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
elsif @items == 1 { @items[0] }
else { '' }
}
FORMATCODE
EVAL $code;

The only small catch is that he’d need to get a slightly different version of the text from CLDR. If the text and were placed verbatim where $two is, that block would end up being @items[0] ~ and ~ @items[1] which would cause a compile error. Luckily, Raku has a command here to help out! By using the .raku function, we get a Raku code form for most any object. For instance:

# REPL output
'abc'.raku # "abc"
"abc".raku # "abc"
<a b c>.raku # ("a", "b", "c")

So he just changed his initial assignment line to chain one more method (.raku):

my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;

Now his code worked. His last step was to find a way to reuse it to benefit from this initial extra work.He made a very rudimentary caching set up (rudimentary because it’s not theoretically threadsafe, but even in this case, since values are only added, and will be identically produced, there’s not a huge problem). This is what he came up with (declarator pod and type information removed):

sub format-list (+@items, :$language 'en', :$type = 'and', :$length = 'standard') {
state %formatters;
my $code = "$language/$type/$length";
# Get a formatter, generating it if it's not been requested before
my &formatter = %cache{$code}
// %cache{$code} = generate-list-formatter($language, $type, $length);
formatter @items;
}
sub generate-list-formatter($language, $type, $length --> Sub ) {
# Get CLDR information
my $format = cldr{$language}.list-format{$type}{$length};
my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;
# Generate code
my $code = q:s:to/FORMATCODE/;
sub format-list(+@items) {
if @items > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
elsif @items == 1 { @items[0] }
else { '' }
}
FORMATCODE
# compile and return
use MONKEY-SEE-NO-EVAL;
EVAL $code;
}

And there he was! His function was all finished. He wrapped it up into a module and sent it off to the other elves for testing:

format-list <apples bananas kiwis>, :language<en>; # apples, bananas, and kiwis
format-list <apples bananas>, :language<en>, :type<or>; # apples or bananas
format-list <manzanas plátanos>, :language<es>; # manzanas y plátanos
format-list <انارها زردآلو تاریخ>, :language<fa>; # انارها، زردآلو، و تاریخ

Hooray!

Shortly thereafter, though, another elf took up his work and decided to go even crazier! Stay tuned for more of the antics from Santa’s elves how they took his lists to another level.

Day 21: Raku and I: Journey begin …

It has been ages since I last blogged about Raku. The only time, I have blogged about when I took part in The Weekly Challenge. But then this also changed recently as I finally found time to contribute to the weekly fun challenges but no blogging still.

I would say it is all about my mental state, since I have so much to talk about. Recently I was approached by a very dear friend and senior member of Raku Community if I am interested in contributing to the Raku Advent Calendar 2022. So as you guessed it rightly so, I have a compelling reason to get back to blogging.

But then you know ….

I always have too many things on my plate, so getting done something new is always tricky. However I had made up my mind, no matter what I would give it my best shot.

Here I am …

So what am I going to talk about then?

Those who know me personally, are aware that I am a Perl guy by nature. Having said, I started playing with other languages recently thanks to the vibrant group of Team PWC. In this blog post, I would like to talk about some of my contributions to the weekly challenge in the new found love Raku language.

1: Prime Sum

You are given a number $N. Write a script to find the minimum number of prime numbers required, whose summation gives you $N.


With the power of Raku built-in features, it is a piece of cake, if you know what you are doing. For me, the official Raku Documentation is the answer to all my questions. And if I can’t find what I am looking for then I ask my friends on various social platform. 9 out of 10 times, I get the answer instantly.

So in this case, all the hard work is done by is-prime. I am a big fan of method chaining as you can see below. Just for the sake of the reader, I go through the list 2..$sum and grep everything that is-prime.

Isn’t beautiful? For me, it is.

sub find-prime-upto(Int $sum) {
    return (2..$sum).grep: { .is-prime };
}

Now with the handy subroutine ready, we are read to solve the task as below:

For my Perl friends new to Raku, the only thing might trouble you is the use of [+], right?

It is the Reduction Operator [] that works on lists of values.

sub prime-sum(Int $sum) {

    my @prime = find-prime-upto($sum);
    my @prime-sum = Empty;
    for 1..$sum -> $i {
        for @prime.combinations: $i -> $j {
            my $_sum = [+] $j;
            @prime-sum.push: $j if $_sum == $sum;
        }
    }

    return @prime-sum;
}

Now glue together everything as below:

use v6.d;

sub MAIN(Int $SUM where $SUM > 0) {
    prime-sum($SUM).join("\n").say;
}

Still not done yet as unit test is nice to have.

use Test;

is-deeply prime-sum(6).<>,  [],                  "prime sum = 6";
is-deeply prime-sum(9).<>,  [(2, 7),],           "prime sum = 9";
is-deeply prime-sum(12).<>, [(5, 7), (2, 3, 7)], "prime sum = 12";

done-testing;

2: Fibonacci Sum

Write a script to find out all possible combination of Fibonacci Numbers required to get $N on addition.

You are NOT allowed to repeat a number. Print 0 if none found.


You may find the solution below somewhat similar to the above work but there is something new for Perl fans. In Perl we can get the last element of a list $list[-1] but in Raku it is slightly different as you see below.

One more thing, if you look at closely the parameter checks done in the signature itself which we don’t have in Perl.

Raku rocks !!!

sub fibonacci-series-upto(Int $num where $num > 0) {
    my @fibonacci = (1, 2);
    while @fibonacci.[*-1] + @fibonacci.[*-2] <= $num {
        @fibonacci.push: @fibonacci.[*-1] + @fibonacci.[*-2];
    }

    return @fibonacci;
}

Now we are ready to solve the task as below:

Did you notice something special here?

Yes, .combinations. Again all built-in, no need to import any library. It generates all possible combinations of given size.

sub fibonacci-sum(Int $sum where $sum > 0) {

    my @fibonacci     = fibonacci-series-upto($sum);
    my @fibonacci_sum = Empty;
    for 1 .. $sum -> $i {
        last if $i > @fibonacci.elems;
        for @fibonacci.combinations: $i -> $comb {
            my $_sum = [+] $comb;
            @fibonacci_sum.push: $comb if $_sum == $sum;
        }
    }

    return |@fibonacci_sum;
}

Final application.

use v6.d;

sub MAIN(Int :$N where $N > 0) {
    fibonacci-sum($N).join("\n").say;
}

Time for some unit test too.

use Test;

is-deeply fibonacci-sum(6), ((1,5), (1,2,3)), "fibonacci sum = 6";
is-deeply fibonacci-sum(9), ((1,8), (1,3,5)), "fibonacci sum = 9";

done-testing;

3: Count Set Bits

You are given a positive number $N.

Write a script to count the total numbrer of set bits of the binary representations of all numbers from 1 to $N and return $total_count_set_bit % 1000000007.


For this task, Raku has most of the funtions built-in, so nothing to be invented.

As you see, it is one-liner, (1..$n).map( -> $i { $c += [+] $i.base(2).comb; }); where all the work is done.

.map() works same as in Perl. In this case each element gets assigned to $i. Further on $i gets converted to base 2 i.e. binary form then finally split into individual digits using .comb.

How can you not fall in love with Raku?

sub count-set-bits(Int $n) {
    my $c = 0;
    (1..$n).map( -> $i { $c += [+] $i.base(2).comb; });
    return $c % 1000000007;
}

Unit test to go with it.

use Test;

is count-set-bits(4), 5, "testing example 1";
is count-set-bits(3), 4, "testing example 2";

done-testing;

4: Smallest Positive Number

You are given unsorted list of integers @N.

Write a script to find out the smallest positive number missing.


This task introduced me something new that I wasn’t aware of earlier.

I always wanted to put check on the elements of input list. In this task, I am checking every elements in the given input list is integer. Also the return value is of type integer too. All these done with one line @n where .all ~~ Int --> Int. This is the power of Raku we can have the power in our script.

Also to sort a list, just use .sort together with .grep makes it very powerful.

The .elems gives me the total number of elements in the list.

sub smallest-positive-number(@n where .all ~~ Int --> Int) {

    my @positive-numbers = @n.sort.grep: { $_ > 0 };
    return 1 unless @positive-numbers.elems;

    my Int $i = 0;
    (1 .. @positive-numbers.tail).map: -> $n {
        return $n if $n < @positive-numbers[$i++]
    };

    return ++@positive-numbers.tail;
}

Final application looks like below.

Did you see something new?

Well, it shows how to set the default parameter values.

use v6.d;

sub MAIN(:@N where .all ~~ Int = (2, 3, 7, 6, 8, -1, -10, 15)) {
    say smallest-positive-number(@N);
}

Time for unit test again.

use Test;

is smallest-positive-number((5, 2, -2, 0)),  
   1, "testing (5, 2, -2, 0)";
is smallest-positive-number((1, 8, -1)),     
   2, "testing (1, 8, -1)";
is smallest-positive-number((2, 0, -1)),     
   1, "testing (2, 0, -1)";

done-testing;

CONCLUSION

Learning Raku is an ongoing journey and I am loving it. I haven’t shared everything to be honest. If you are interested then you can checkout the rest in my collections.

Enjoy the break and stay safe.

Day 20: Sigils are an underappreciated programming technology

Sigils – those odd, non-alphabetic prefix characters that many programmers associate with Bash scripting; the $ in echo $USER – have a bit of a bad reputation. Some programmers view them as “old fashioned”, perhaps because sigils are used in several languages that first gained popularity last millennium (e.g. BASIC, Bash, Perl, and PHP). Other programmers just view sigils as rather pointless, as “just a way of encoding type information” in variable names – basically a glorified version of systems Hungarian notation (which isn’t even the good kind of Hungarian notation).

Maybe sigils served a purpose in the bad old days, these critics say, but modern IDEs and editors give us all the type information we could want, and these tools made sigils obsolete. Now that we have VS Code, we don’t have any reason to take the risk that someone might use sigils to write code that bears a suspicious resemblance to line noise, or perhaps to an extremely angry comic strip character.

A cropped panel of a newspaper comic showing one character's head.  A speech bubble from that head has the text 'Awww… ?%$X☹#©!!!'
This is a family-friendly post, I swear!

Or so they say. But I disagree – as do many of the hackers whose perspectives and insights I value most.

This post represents my attempt to convince you that sigils are a powerful tool for writing clear, expressive code. No, strike that, I’ll go further – this post is my argument that sigils are a powerful tool for clear communication in general; sigils being useful for programming is just an application of the more general rule.

To investigate this claim, we’ll start with three non-programming situations where sigils let us communicate more clearly and expressively. Second, we’ll use these examples to dig into how sigils work: Where does their expressive power come from? And what makes particular sigils good or bad at their job?

Once we’ve wrestled with sigils in general, we’ll turn to the specific case of programming-language sigils. We’ll investigate whether the general power of sigils carries over to the project of writing clear, expressive code, where our goal is to express ourselves in ways that our computer and our readers can both understand. We’ll also consider the unique challenges – and extra powers – that are relevant to programming-language sigils. And we’ll examine some sigils in action, to judge how helpful they really are.

By the time you reach the end of this post, I believe that you’ll have a better understanding of how sigils work, a new mental tool to apply to your communication (programming and non-programming), and – just possibly – a bit more appreciation for the languages that seem like they’re swearing at you.

Three non-programming sigils that aid clarity

1. Chatting about sigils

If you’ve ever talked with programmers online, you’ve probably seen someone say something like

We had that problem at $day_job: our code was a mess, but everyone thought $framework would magically fix it.

I don’t mean that you’ve seen someone express that sentiment (though, let’s face it, you probably have). Instead, I mean that you’ve probably seen someone use terms like $day_job or $framework. And, even if you haven’t, you can probably tell what they mean.

In both cases, the $ sigil marks a placeholder word – in $day_job, probably because the speaker doesn’t want to reveal where they work; in $framework, to avoid turning the conversation into a debate about the merits of a particular framework. Of course, neither sigil is indispensable; you could reword the sentence to use phrases like “at my day job” or “a new framework”.

So using a sigil saves us a couple of words, which you might view as a trivial benefit. But, in the context of synchronous communication, it really isn’t trivial – anything that reduces the time that an entire team spends looking a $boss is typing... message (see what I did there?) is a clear win.

The more important benefit of the sigil is increased clarity: “a new framework” communicates that the speaker is talking about some indefinite framework, no more. If you want to know which framework, it’d be natural to ask. In contrast, $framework communicates that the speaker has consciously decided to omit the name of the framework. Someone might still ask, but they’d be aware that they’re asking for a detail that was deliberately withheld.

2. Now that you mention it…

And, of course, if sigils can communicate something to humans, they can also communicate something to computers. Which brings us to the way that everyone uses sigils these days (even if they don’t know the term): mentions and/or hashtags


xkcd comic 1306.  The comic depicts a graph with 'time' on the horizontal axis and 'odds that a word I type will start with some weird symbol' on the vertical axis.  A line oscillates up and down, similar to a sine wave, with three peaks.  The peaks are labeled '$QBASIC', '$Bash @$Perl', and '+Google @twitter #hashtags'.

Every time a programmer @s someone on GitHub – for that matter, every time someone describes themselves as #blessed or tags a post as #nofilter – they’re using a sigil.

Maybe you have a reaction to that?

While we’re on the subject of hashtags and @mentions, let’s briefly digress to something that isn’t technically a sigil but that’s closely related: reactions – those emoji-style responses to SMS messages, GitHub issues, and Slack chats. Love them or hate them, I’m sure you’ve seen them.

They’re not sigils (they’re used alone, not as a prefix to a word). But they’re worth talking about here because – like sigils – they take advantage of an inherent element of human nature: we’re visual, and think in symbols. GitHub could trivially replace all of their reactions with words. In fact, they list the word equivalent of each symbol: 👍 ⇒ +1; 👎 ⇒ -1; 😀 ⇒ smile; 😕 ⇒ confused;  ⇒ heart; and 🎉 ⇒ tada. So why does GitHub use symbols instead of the equivalent words?

Because when you react to project milestone with 🎉, that’s a better way to communicate excitement/congratulations than using words – and much better than saying “tada”. Similarly, symbols like 👍 and 👎 are direct ways to communicate approval or disapproval to someone who understands the symbol. Of course, some cultures ascribe a very different meaning to a thumbs up gesture; I’m not at all claiming that 👍 magically avoids ambiguity.

But if someone does understand 👍, then they understand it directly as a single symbol. They’re not translating 👍 into the words “I approve and/or agree” any more than a math-literate person translates “a ÷ (b × c)” into the words “the quotient of ‘a’ divided by the product of ‘b’ and ‘c’”. In both cases, they’re reasoning with the symbols directly; no translation needed. (See also the APL-family programming languages, which take the insight about the power of symbolic reasoning to its (il)logical extreme.)

3. In email, tags > folders – but sigils are my secret weapon

So far, we’ve discussed two non-programming examples where you’ve probably already encountered the power of sigils and symbolic communication. For our third example, lets look at a context that won’t be as familiar to you: the system I use to manage my email inbox.

For the past decade, I essentially didn’t organize my emails. Instead, I followed the recommendation of a 2011 study of email re-finding: dump everything into a single Archive and get better at searching. Maybe that was correct at the time or maybe I just wanted an excuse to hit the Archive All button. But by last year, that system had clearly stopped working – between my different projects, committees, mailing lists, and patchlists, I simply get too much relevant email to be able to effectively search a single Archive.

So I turned to Thunderbird filters. But unlike many people, I’m not using filters to make emails skip my inbox; I like to see the stream of incoming messages. Instead, I use filters to programmatically apply labels to incoming emails (e.g., emails from/to the Raku Steering Council are labeled “RSC”). And when I archive emails, other filters move the email to the correct folder based on their labels.

But this left me with a decision: should my labels have folder semantics (each email is in exactly one folder) or tag semantics (emails can have any number of tags, including zero)? The issue is a fairly contentious one – it’s been debated for years, but that post still generated 140+ comments of debate. The merits of the two approaches aren’t relevant here; I’ll just say that I eventually decided to use some of both.

Specifically, I decided to give four labels folder semantics: Work, Life, List, and Bulk. Every email in my inbox should be automatically assigned exactly one of these labels – if it has more or less than one, something has gone wrong and I need to fix my filtering rules. And when an email is archived, it should be moved to a folder that corresponds to one of these folder-labels.

But every label other than those four gets tag semantics: they’re optional, and emails can have any number of these tag-labels. Examples include Raku, RSC, TPRC, Family, Rust, conference, guix, and blogs.

So far, so good – but also irrelevant to sigils. How do sigils come into this picture?

Well, I wound up with two different types of labels (folder-labels and tag-labels), each with different semantics. Further, I need to be able to quickly distinguish between the two types of labels so that I can notice emails that don’t have exactly one folder-label.

This is a job for sigils. I added the sigil to my folder-semantic labels (⋄Work, ⋄Life, ⋄List, and ⋄Bulk) and the sigil to my tag-semantic labels (e.g., •Raku or •Family). Now, if I see an email that’s labeled ⋄Work, •Raku, •rainbow-butterfly, •RSC, I can instantly see that it has just one folder-semantic label. But if I saw one with •Family, •Parents, •conference, I’d know that it was missing its folder-semantic label.

Why do the sigils in ⋄Work and •Raku matter?

Using ⋄Work and •Raku solves a problem – one without similarly good, non-sigil solutions. When we were talking about $day_job, the sigil was helpful but not essential: saying “my day job” would work maybe 90% as well.

But here, there isn’t a similar 90% solution. We can’t leave the folder-labels and tag-labels undifferentiated; that’s a recipe for confusion. What about the use-your-words approach we employed with $day_job? The equivalent substitution here would be to use labels like primary_Work in place of ⋄Work. But even though primary_Work does communicate that label has folder semantics, the substitution costs far more than the 10% we estimated for $day_job.

Indeed, primary_Work is a poor substitute for ⋄Work for exactly the same reason that “confused” is a poor substitute for 😕: in both cases, switching from a symbol to text dramatically ups the cognitive load involved and correspondingly slows reading. Put differently, ⋄Work is (mentally) pronounced “work”, not “primary work”.

⋄Work conveys information symbolically, which makes understanding that info easier and faster. In turn, faster understanding means that reading ⋄Work, •Raku, •rainbow-butterfly, •RSC in a glance is practical, but reading primary_Work, secondary_Raku, secondary_rainbow-butterfly, secondary_RSC isn’t. There’s far more than a 10% improvement in the readability of ⋄Work, and that difference is 100% sigils.

Learning from non-programming sigils

We’ve looked at 3½ examples where sigils helped in non-programming contexts: chat messages, hashtags, and email labels (plus ½ credit for emoji reactions). In each case, using sigils lets us communicate more clearly. But can we generalize from these examples into an explanation for sigils’ power?

I think so. Sigils are powerful because they use semantically dense symbols to quickly and easily communicate meaningful, low-context information to the reader. That was a mouthful, so let’s unpack it, one term at a time.

2. Sigils use semantically dense symbols

The defining feature of sigils is that they’re symbolic: ⋄Work uses a sigil; primary_Work doesn’t. The symbolic nature of sigils is key to their power: because they encode an entire phrase’s worth of information into a single glyph, they have a much higher semantic density. Put differently, they let you say more, with less.

Recognizing value of code that packs a lot of meaning into a small package isn’t a novel insight, of course. And neither is the observation that symbols are extraordinarily good at concise communication. Hillel Wayne made a similar point a couple of years ago:


A tweet by @hillelogram with the text 'I think sigils (like the dollar sign) in programming are underrated. We recognize they're bad for readability and you should use more descriptive names, but we also use 'i' as an iterator variable name, so there's something more legible to us about terse names when we can get away with them'

Indeed, APL programmers have been practically shouting this insight from the rooftops for over 30 years: Ken Iverson, the designer of APL, opened his famous 1979 Turing Award talk, Notation as a Tool of Thought, with exactly this point:

The quantity of meaning compressed into small space by algebraic signs, is another circumstance that facilitates the reasonings we are accustomed to carry on by their aid.

But those weren’t Iverson’s words: he was quoting a book published in 1826 by Charles Babbage (the “father of the computer”, if that’s even a meaningful title). And then, just to complete the cycle of quotation – and drive home the point that a focus on semantic density is widespread – Paul Graham quoted Iverson (quoting Babbage) in his 2002 essay Succinctness is Power.

I might not value succinctness quite as highly as Graham’s essay did, but it’s hard to deny that sigils’ expressive concision provides quite a bit of power. And, indeed, we can see evidence of that expressive power in one of the non-programming sigils we discussed: The hashtag is so expressive that it’s even starting to make its way into spoken language.

2. Sigils can be used quickly and easily

Given the immense power of symbols to create expressive, semantically dense code, why shouldn’t we use them for everything? Should everyone be programming APL? Is there such a thing as code that’s too semantically dense?

A page of APL code.  The code contains almost entirely symbols and is impossible to read for anyone unfamiliar with APL
Why, yes, yes it is.

Despite my admiration for APL, I think it gets the balance wrong. I appreciate symbols, but I also like words (I know, you’re shocked, shocked to learn that about the person responsible for the ~2.5k words you’ve just read). So a few words in defense of words: although symbols offer tremendous semantic density, they sacrifice flexibility; symbols are best when they play a supporting role to words. They’re the punctuation, while words are, well, the words.

Another downside to fully embracing symbols can be seen in APL’s overwhelming number of symbols. The problem with APL’s symbolic abundance isn’t the learning curve – that takes time, but veteran APLers have long mastered the vocabulary. Instead, the problem with APL’s extremely large symbol vocabulary is that it crowds out user-created vocabulary.

This leaves little space for users to grow their language or to solve specific problems with specific languages; that is, it discourages DSLs. And, indeed, some of the best APL programmers aren’t a fan of DSLs. I respect this view but respectfully disagree.

So, if we don’t want to fully embrace symbols, APL-style, where should we draw the line with sigils?

Well, we want our sigils to be both memorable and quickly recognizable. This will both help new users learn them faster and allow experienced users to read sigils without expending any cognitive effort.

Or, put slightly differently, good sigils should be easy to use – but easy in the very specific sense from Rich Hickey’s Simple Made Easy talk. Hickey distinguishes between “simple” (an objective measure of how intertwined/“complected” something is) from “easy” (a subjective measure of how familiar and practiced/“near to hand” something is).

Hickey distinguish between simple and easy to argue in favor of simplicity. In general, I agree. But in the specific case of sigils (or symbols more broadly) I think that making them easy – in the “near to hand” sense – is crucially important. Being easy matters because sigils derive much of their power from their ability to communicate to experienced readers almost for free.

For example, when I see @codesections, I perceive that I’ve been mentioned without devoting any conscious thought to the @. The @ communicates to me in same way that the capital letters in “the White House” communicates that we’re talking about the U.S. president without me ever thinking “oh, capital W means a proper noun”. But sigils can only get that meaning-for-free effect when the sigils are very near to hand indeed.

One way to make sigils easy is to make them visually distinct. The sigils we’ve seen $, #, @, ⋄, and pass this bar. In contrast, using 😀, 😃, and 😄 as sigils would never be easy.

Additionally, sigils are easier to use if users read them frequently. So there probably shouldn’t be many sigils and every sigil should be used often. For an example of this done well, consider social media’s use of # and @ – just two sigils, both used daily. The same goes for my and sigils: they’re applied to all my emails, so I use both every day.

Finally, sigils will be easier to use if users practice using them, so sigils should be convenient to type. Of course, since “easy” is a subjective question, “convenient to type” depends on the sigil’s users. So and were good sigils for their target user – me – because the compose key lets me type them painlessly. But they’d be the wrong sigil to choose for a user group that finds typing non-ASCII characters difficult. That description probably applies to enough programmers that, at least right now, programming sigils are probably better off sticking to ASCII … even though that is a bit ☹.

When sigils follow those rules – they’re visually distinct, few in number, and read and used frequently – they’re able to communicate practically for free. Which is pretty powerful.

3. Sigils must communicate meaningful, low-context information

Earlier, we said that sigils have high semantic density because they’re short. But, of course, that implicitly assumes that they convey some useful meaning. If they don’t mean anything, then they really are low-semantic-density characters that just get in the way – that is, line noise.

Going back to our starting example, if someone said

We had that problem at $day_job: our $code was a $mess, but everyone thought $framework would $magically fix it.

…then that’s not using sigils, it’s just being confusing.

Moreover, sigils should do more than communicate some useful info; they should communicate that information in a low-context way. By low-context, I mean that the reader should be able to grasp the sigil’s full meaning using purely local reasoning. When a word has a good sigil – like ⋄Work – you can look at the sigiled-word in isolation and fully grasp its meaning.

In contrast, consider GitHub’s #-sigiled issues and pull requests (e.g., #1066). I view these as miserable sigils. But the problem isn’t that #1066 fails to communicate useful information: it communicates that 1066 refers to an issue or PR in the current repo (and not, say, a year). And, by cross referencing that number with a list of the issues and PRs, you can learn what it was about. But that info requires cross referencing with data outside the immediate context – that is, data that’s not locally available.

Relying on external context tanks the sigil’s usefulness because humans really struggle to remember more than a few things at a time – a fact of which programmers are frequently and forcibly reminded whenever we try to exceed that threshold. So we really don’t want sigils that require the reader to keep additional, non-local context in their short-term memory.

A good sigil should avoid that; it’s meaning should be immediately and locally clear.

Programming language sigils

Sigils for programming languages are a lot like sigils in other contexts. They use a single symbol to say a lot, concisely; to do so, they must convey low-context information that’s easy to understand. But programming differs from other communication contexts in one fundamental respect, and this difference means that programming sigils face unique challenges and opportunities.

The key difference between programming and other forms of communication is that code always has two audiences: human readers and the computer. Some programmers aim to write for people to read, and only incidentally for the computer; others might be happier writing in unadorned hexadecimal numbers, with no human readers at all. Nevertheless, as Donald Knuth observed, programs must always be written both for computers to execute and for humans to understand.

This duality applies just as strongly to sigils: when an author puts a sigil in their code, they’re simultaneously communicating to the compiler and to readers of that code. And since the human readers depend on their ability to accurately model the computer, the semantics given to sigils must at all costs avoid giving inconsistent messages to those two audiences.

But there’s a subtle-but-crucial nuance: often, the information that a sigil is communicating to the reader is about information that it conveyed to the computer. As an analogy, think back to the hashtag and @mention sigils. Using # tells readers that the following word is a hashtag – but the following word only is a hashtag because the # told computers to treat it that way. The # didn’t communicate inconsistently; it communicated something to the computer that caused a change and simultaneously communicated that change to readers. This same pattern plays out in many programming sigils and is a key source of their expressive power.

The three-way conversation between the author, computer, and reader works in the other direction as well: just as the code author is communicating with both reader and computer, the computer can communicate with author and reader. (Sadly, this symmetry doesn’t extend to the reader communicating back in time to the author, though that would greatly simplify software maintenance.)

One consequence of the computer → author communication is that the computer intervene if the author tries to use an invalid sigil. Thus, even the weakest version of sigils wouldn’t reduce down to Hungarian notation – at worst, it would be compiler verified Hungarian notation.

But encoding a variable’s type isn’t a good use for sigils, anyway, due to the computer → reader communication. Specifically, the computer can communicate type info to the reader, and the sigil-skeptics are correct that IDEs/editors have gotten pretty good at doing so. But what those skeptics seem to miss is that sigils can communicate far more meaningful sorts of information anyway. Using a sigil just to denote the type would be a waste of a perfectly good sigil, so IDE-supplied type info is entirely irrelephant.

If a sigil shouldn’t convey type info, what should a sigil communicate? Well, that question is unanswerable in the abstract – it depends on the needs of the particular programming language that supplies the sigil. I’ll discuss the language I’m most familiar with, Raku – which you must have expected would eventually make an appearance on this Raku Advent Calendar post.

How Raku uses sigils

This isn’t the time or place to introduce you to Raku, but in case you’re not familiar with it, I’ll just say that you should be able to follow everything below without any Raku-specific knowledge. Well, and that Raku is, by far, my favorite programming language; I believe it’s pretty much the ideal language for writing free software – in large part because Raku provides the expressive power needed for a small team to keep up with a much larger big-tech team. And some of that power comes from Raku’s sigils.

1. Raku without sigils

Before we look at Raku’s sigils, I’ll mention that you can write Raku without any sigils at all; they’re a feature, not a requirement. For example, the JavaScript code const foo = 42 can be written in Raku as my \foo = 42. Rakoons choose to use these sigils because they – we – believe that they make for clearer code. Let’s take a look at why.

2. Raku with sigils

Raku provides sigils that let you communicate (to the computer and the reader) what interface a variable supports – that is, how you can use that variable. Raku has four sigils; here’s what each communicates:

  • @ says “Use me with an array-like interface”
  • % says “Use me with a hash-like interface”
  • & says “Use me with a function-like interface”
  • $ says “I won’t tell you what interface you can use, but treat me as a single item”

What does it mean to treat a variable as a single entity? Well, imagine I’ve got a grocery list with five ingredients on it. Saying that I’ve got one thing (a list) is true, but saying that I’ve got five things (the foods) is also true from a certain point of view. Using $ versus @ or % expresses this difference in Raku. Thus, if I use @grocery-list with a for loop, the body of the loop will be executed five times. But if I use $grocery-list, the loop will get executed just once (with the list as its argument).

This matters for more than just for loops; Raku has many places where it can operate on either an entire collection or on each item in a collection (this often comes up with variadic functions). The sigil tells Raku how to behave in those cases: it treats $grocery-list as one item, but operates on each food in @grocery-list. We can temporarily opt into the other behavior if needed, but the sigil provides a reminder to keep us and Raku on the same page.

Bonus powers and defaults

Raku’s sigils have a couple of extra powers that aren’t directly part of their interface – they’re just bonus perks.

The first of these perks is interpolation. In Raku, every sigiled variable is eligible for interpolation in the right sort of string. The exact details depend on the sigil and aren’t worth getting into here (mostly based on how the characters are typically used in strings – it’s kind of nice that “daniel@codesections.comdoesn’t interpolate by default). You can selectively enable/disable interpolation for specific sigials or temporarily enable interpolation in strings that normally don’t allow it with \qq[ ] (like JavaScript’s ${ }). Between this, its Unicode support, and rationalized regex DSL system, I’m prepared to confidently claim that Raku’s text manipulation facilities significantly outdo any language I’ve tried.

The second perk is a bit of syntax sugar that only applies to &-sigiled variables but that’s responsible for a fair bit of Raku’s distinctive look. We already said that you can invoke &-sigiled variables with syntax like &foo(). But due to sugar associated with &, you can also invoke them with by omitting both the & and the parentheses. Thus, in Raku code, you typically only see a & when someone is doing something with a function other than calling it (such as passing it to a higher order function). I’ve previously blogged about how you can write Raku with extra parens to give it a syntax and semantics surprisingly close to lisp’s, so it’s only fair to point out that, thanks to this & perk, it’s possible to write Raku with basically no parentheses at all.

Finally, the @ and % sigils provide a default type. I’ve mentioned a few times that @ does not mean that a variable is an Array, just that it provides an array-like interface. New Rakoons sometimes get confused about this, maybe because many @-sigiled variables you see (especially starting out) happen to be Arrays, and many of the %-sigiled variables happen to be Hashs. That’s not too surprising; ordered-mutable-list and key-value-hashmap are both useful, general abstractions – there’s a reason JavaScript was able to survive so many years with just hashes and objects.

To support the common use case of @-sigiled variables being Arrays and %-sigiled variables being Hashes, Raku provides them as default types when you declare a variable with @ or %. That is, when you assign into an uninitialized @-sigiled variable, Raku provides a default Array (and the same for % and Hash). So we can write my @a = 1, 2 to create an Array with 1 and 2; or we can write my %h = k => "v" to create a Hash. But this is just a default – you’re entirely free to bind any type that provides the correct interface.

At this point, we’ve covered powers that the four Raku sigils provide. Here’s a table with a summary before we move on:

sigil @ % & $
Interface Positional Associative Callable Scalar
Default Array Hash (none) (none)
Iteration One at a time; fixed order One pair at a time; random order Entire container Entire container
Guarantee Positional indexing Associative indexing invokable (none)
3. Raku sigils in practice

To really judge their power, let’s look at Raku’s sigils in a more practical setting. Imagine that we’re working with FiveThirtyEight’s World Cup predictions – specifically, their pre-game prediction of the final:

A depiction of Five Thirty Eight's prediction for the World Cup final.  It assigns Argentina a 53% chance of victory and France a 47% chance

Where would we start? Our data consists of two key–value pairs. Let’s represent them with two of Raku’s Pairs: Argentina => .53 and France => .47. Next, we’ll want to store these pairs into a variable. But what sigil should we use?

Well, we could use no sigil at all, but that would sacrifice all the perks that come with sigils in Raku. No thanks. Or we could use a &-sigiled variable by writing my &f = {(Argentina => .53, France => .47)}. But this would be a pretty odd – ok, bizarre – choice. By using & we get a function that takes no arguments and, when invoked, returns the two pairs – which seems strictly inferior to working with the two pairs directly. I mention the possibility of using & only to emphasize that it’s our choice: we choose the sigil (and thus the semantics).

With those two out of the way, let’s consider the three viable options: @, %, and $.

Using the @ sigil creates an Array containing two Pairs; we could do that with my @win‑predictions = Argentina => .53, France => .47. This keeps the pairs in order, so it might be a good choice if we care about order (maybe we’re planning to display teams ranked by win chance?). The @-sigiled Array also lets us iterate through the teams one at a time.

Alternately, using the % sigil (my %win‑predictions = Argentina => .53, France => .47) gives us a Hash with team names for our keys and predicted odds as our values, which lets us access a team’s odds by providing their name: e.g., %win‑predictions<France> returns .53. This might be the way to go if we’ll need to access an arbitrary team’s odds of winning (maybe we’re building a search box where you can enter a name to see that team’s predicted odds). The %-sigiled Hash still lets us iterate through the teams one at a time but this time in a random order.

What about the $ sigil? Well, we actually have a few options. $ tells Raku (and the reader) that we’re treating the predictions as a single item (not a collection). This means that my $win‑predictions = Argentina => .53, France => .47 isn’t the syntax we want – since $-sigiled variables are always a single item, that would assign the pair Argentina => .53 to $win‑predictions and discard the second pair. (If we did this, Raku helpfully warns that we might not have meant to.)

To store both Pairs in $win-predictions, we’ll need to group them in some way. For example, we could group them with square brackets, which creates an Array. Or we could group them with curly brackets, which creates a. These two options would look like my $win‑predictions = [Argentina => .53, France => .47] and my $win‑predictions = {Argentina => .53, France => .47}, respectively.

But hold on, if we end up storing an Array or Hash in our $-sigiled variable, how is using $ different from using the @ or % sigils?

It’s different that the $ communicates to Raku and to readers that we’re treating the entire Array/Hash as a single item – and that Raku should too. This has a few effects, most notably that iterating over a $-sigiled Array/Hash will take the entire container at once, rather than one Pair at a time.

This “item” semantics might best fit our mental model if we’re thinking of “matches” as a single entity (instead of collection of teams–odds pairs). Looking at them this way makes a lot of sense – after all, the statement “France had a 47% chance to win” doesn’t mean much without knowing that we’re talking about their match against Argentina. If we do use a $-sigiled variable, then we’ll still need to decide between using an array or a hash. The considerations here are basically the same as in our choice between @ and %: do we care more about preserving order or about indexing by team name?

In sum, we can pick between three sigils. Choosing @ communicates that we’re using an ordered array of Pairs; choosing % communicates that we’re focused on the key–value relationship; and choosing $ communicates that we’re treating the match as a single item.

And, crucially, our choice of sigil communicates that entirely locally: every time a reader (which could be us in a few weeks!) sees win-predictions with a certain sigil, it tells the reader whether they’re dealing with an ordered collection, a collection that associates keys with values, or a conceptually single item. There’s never a need to scroll up to where the variable was defined – and, as the functional programmers keep reminding us, it’s far easier to understand code when we can do so without relying on any remote context.

Finally, it’s important to note that the information we get from the sigil is not the variable’s type: my @pos = 1,2, and my $scalar = [1,2] both create Arrays and if you (or your IDE) ask @pos or $scalar for their type, they’ll both honestly report that they’re Arrays. And, as we discussed, @– and %-sigiled variables aren’t guaranteed to be Arrays and Hashes. The questions “what type is this variable” and “what interface does this variable provide” are orthogonal: answering one doesn’t answer the other. So Raku’s sigils definitely aren’t “just a way of encoding type information that could be displayed by an IDE” – they’re a way to create and document a variable’s interface.

At least in my book, that’s quite a bit of information for a single character to communicate. I’m more than happy to conclude that Raku’s sigils communicate meaningful, low-context information. All in all, I believe we’ve seen that Raku’s sigils can be pretty powerful – and that’s without even mentioning Raku’s nine “twigils” (secondary sigils)!

Conclusion and next steps

If you’ve made it this far, thanks – you’re clearly my kind of weird! …Unless you skipped down to the conclusion in the naive hope that I’d offer some sort of cogent tl;dr ಠ_ಠ

In this post, we’ve talked about sigils generally and seen how – even in non-programming contexts – they can be a powerful tool to concisely communicate with your reader. We also looked at how Raku uses sigils and saw that sigils let Rakoons communicate what interface our variables have both to Raku and to readers of our code.

I hope that I’ve convinced you that sigils, used thoughtfully, can make communication easier and code more readable. In my ideal world, I also hope to have tempted you into taking a closer look at Raku. Realistically, though, this post wasn’t really optimized for doing so – as cool as sigils are, they might not make my list of Top 10 Raku Features (number 7 will shock you!).

But even if they’re not my favorite Raku feature, I wanted to talk about sigils – and specifically Raku’s sigils – because they are a strength of the language. And yet, people sometimes mention sigils as the reason that they or others avoid a language – which strikes me as entirely backwards.

So, whether it’s Raku, Bash, Perl, PHP, or any of the other languages that use sigils, I hope you’ll never again pass on a language because it uses a few more $s than some others. Sigils can be a powerful tool. According to Wikipedia, the word “sigil” derives from the Latin for “little sign with magical power”. And, yeah, “magical power” seems about right to me.

Just, you know, not top-ten-level magic.


Day 19: A few modules to ease working with databases in Raku applications

There’s no single big Raku application I work on regularly at the moment, but there’s plenty of smaller ones that I need to do a bit of work on now and then. Nearly all of them involve using a database for persistence; I tend to reach for Postgres. This year I put together a few Raku modules to ease my work with databases on these projects. All of them are available for installation via zef; maybe some will make nice Christmas gifts for others using Raku and databases together.

Just let me develop this thing locally!

How should I run an application I’m developing on my local machine when it uses a database? I could use the system Postgres, creating a database, user, and so forth. Of course, this isn’t a task I do every day, and not even every month, so I have to go and look up the syntax every time. Then what if the Postgres version in the deployment environment is different from the one on my system? Not likely to be an issue often in such a mature product, but maybe some day it’d be a tripping hazard.

Thankfully, container technology has made it super easy to spin up most things at the version of one’s choice. Postgres containers are readily available. That still leaves a little scripting and plumbing to do to get the container up, create the database and user required, and have the environment variables injected before running the application. I’ve done it a bunch of times, in a bunch of ways. It’s easy, but boring.

What’s less boring? Writing a Raku module to take away the repetitive work, of course! Thus Dev::ContainerizedService. Now I need only write a devenv.raku like this:

#!/usr/bin/env raku
use Dev::ContainerizedService;

service 'postgres', :tag<13.0>, -> (:$conninfo, *%) {
    env 'DB_CONN_INFO', $conninfo;
}

This suffices to have a Postgres 13.0 docker container pulled (if needed), then started up, with a database and user created. These are then injected into the environment for the application; in this case, my application was expected a Postgres connection string in %*ENV<DB_CONN_INFO>.

I can then run my application via this script:

./devenv.raku run raku -I. service.raku

The Postgres instance is run on the host network, and a free port is chosen, which is just great for when I’ve got a couple of different projects that I’m working on at once. If I’m using cro and the development runner that restarts the service on changes, that’d be:

./devenv.raku run cro run

By default, it creates a throwaway database each time I run it. If I want to have the database persisted between runs, I need to add a project name and specify that it should store the data persistently:

#!/usr/bin/env raku
use Dev::ContainerizedService;

project 'my-app';
store;

service 'postgres', :tag<13.0>, -> (:$conninfo, *%) {
    env 'DB_CONN_INFO', $conninfo;
}

Sometimes, for debugging reasons, I’ll want a psql shell to poke at the database. That’s available with:

./devenv.raku tool postgres client

See the docs to learn about some of the more advanced features, and how to add support for further services (I’ve done Postgres and Redis, since those are what I had immediate use for).

I want integration tests that hit the database!

Unit tests that stub out the database, perhaps using something like Test::Mock, are all well and good, but do I really think my data access code is somehow going to be perfect? Of course not; it needs testing too.

Of course, that means some plumbing. Setting up a test database. Doing the same in the CI environment. I’ve done it a dozen times before. It’s easy. It’s boring. Why can’t I have a Raku module to take away the tedium?

Well, I could if I wrote it. Thus Test::ContainerizedService, the testy descendant of Dev::ContainerizedService. It’s actually a small wrapper around the core of Dev::ContainerizedService, meaning that one only has to add support for a database or queue in Dev::ContainerizedService, and then it’s available in Test::ContainerizedService too.

Using it looks like this:

use Test;
use Test::ContainerizedService;
use DB::Pg;

# Either receive a formed connection string:
test-service 'postgres', :tag<14.4> -> (:$conninfo, *%) {
    my $pg = DB::Pg.new(:$conninfo);
    # And now there's a connection to a throwaway test database
}

In a nutshell, wrap the tests up in a test-service block, which does what is needed to get a Postgres container up and running, and then passes in the connection information. If docker is not available, the tests are skipped instead.

What about migrations?

The two previous gifts are ready for unwrapping this Christmas. I’ve also been working on one that is probably only unwrappable for the adventurous at this point: DB::Migration::Declare.

It’s not the first Raku effort at database migrations – that is, the idea of having an ordered, append-only list of database change that together bring the database schema up to the current state. The Red ORM has some work in that direction, for those using Red. There’s also a module where one writes the SQL DDL up and down steps, and it applies them. I’ve used it, it works. But inspired by the benefits and shortcomings of Knex.js migrations, which I’ve been using quite a bit this year at a customer, I decided to set about building something sort of similar in Raku.

The idea is relatively simple: use a Raku DSL for specifying the migrations, and have the SQL to put the changes into effect generated. Supposing we want a database table to track the tallest skyscrapers, we could write this:

use DB::Migration::Declare;

migration 'Setup', {
    create-table 'skyscrapers', {
        add-column 'id', integer(), :increments, :primary;
        add-column 'name', text(), :!null, :unique;
        add-column 'height', integer(), :!null;
    }
}

Assuming it’s in a file migrations.raku alongside the application entrypoint script, we could add this code:

use DB::Migration::Declare::Applicator;
use DB::Migration::Declare::Database::Postgres;
use DB::Pg;

my $conn = $pg.new(:conninfo(%*ENV<DB_CONN_INFO>));

my $applicator = DB::Migration::Declare::Applicator.new:
        schema-id => 'my-project',
        source => $*PROGRAM.parent.add('migrations.raku'),
        database => DB::Migration::Declare::Database::Postgres.new,
        connection => $conn;
my $status = $applicator.to-latest;
note "Applied $status.migrations.elems() migration(s)";

At application startup, it will check if the migration we wrote has been applied to the database yet, and if not, translated it to SQL and apply it.

If a little later we realized that we also wanted to know what country each skyscraper was in, we could write a second migration after this first one:

migration 'Add countries', {
    create-table 'countries', {
        add-column 'id', integer(), :increments, :primary;
        add-column 'name', varchar(255), :!null, :unique;
    }

    alter-table 'skyscrapers',{
        add-column 'country', integer();
        foreign-key table => 'countries', from => 'country', to => 'id';
    }
}

On starting up our application, it would detect that the latest migration had not yet been applied and do so.

DB::Migration::Declare doesn’t just produce schema change SQL from Raku code, however. It also maintains a model of the current state of the database. Thus if my previous migration had a typo like this:

    alter-table 'skyskrapers',{
        add-column 'country', integer();
        foreign-key table => 'countries', from => 'country', to => 'id';
    }

It would detect it and let me know, before it even gets so far as trying to build the SQL:

Migration at migrations.raku:11 has problems:
  Cannot alter non-existent table 'skyskrapers'

It detects a range of such mistakes – not only typos, but also semantic issues such as trying to drop a table that was already dropped, or adding a duplicate primary key.

Development environment, test environment, and migrations

Not quite gold, frankincense, and myrrh, but my hope is that somebody else might find these useful too. Cheers!

Day 18: Something else

Santa was absent-mindedly going through the Rakudo commits of the past weeks, after hearing about the new 2022.12 release of the Rakudo compiler. And noticed that there were no commits after that release anymore. Had all the elves been too busy doing other stuff in the Holiday Season, he wondered. But, in other years, the Raku core elves had always been very busy in December. He recalled December 2015 with a bit of a smile on his face: my, my, had the elves been busy then!

A little worried, he asked Lizzybel to come in again. “So, why is nobody working on Rakudo anymore”, he asked. “Ah, that!”, Lizzybel said. “Not to worry, we changed the default branch of Rakudo to ‘main'”, she said. “Why would you do that?”, Santa asked, showing a bit of grumpiness. “Was the old default branch not good enough?”. Lizzybel feared a bit of a long discussion (again), and said: “It’s the new default on Github, so us Raku core elves thought it would be a good idea to follow that, as many tools now assume ‘main’ as the default branch”.

“Hmmrph”, said Santa, while he switched to the ‘main’ branch’. “Wow!, more than 780 commits since the 2022.12 release, how is that possible?”, he exclamed. “Don’t the elves have nothing better to do in this time of the year?” he said, while raising his voice a bit. Lizzybel noticed his cheeks turning a little redder than usual.

“Ah that!”, said Lizzybel again.

RakuAST

And she continued, again. Remember the RakuAST project that was started by the main MoarVM elf about two and a half year ago? It’s been in off-and-on development since then, and now the core elves deemed it ready enough to make that work available in this new ‘main’ branch. So that other core and non-core elves could easily try out some of the new features that it is providing. “So, it’s now done, this RakuAST project?”, said Santa, with a little glimmer of hope in his eyes. “Ah, no, you could say that the project is now more than halfway”, Lizzybel said, hoping it would be enough for Santa. “Remind me again what that project was all about?”, Santa said, destroying Lizzybel’s hope for a quit exit.

While sitting down, Lizzybel said: “An AST can be thought of as a document object model for a programming language. The goal of RakuAST is to provide an AST that is part of the Raku language specification, and thus can be relied upon by the language user. Such an AST is a prerequisite for a useful implementation of macros that actually solve practical problems, but also offers further powerful opportunities for the module developer. RakuAST will also become the initial internal representation of Raku programs used by Rakudo itself. That in turn gives an opportunity to improve the compiler.” “I bet you had ChatGPT type that out for you to memorize”, Santa said with a twinkle in his eye.

“Eh, no, actually, this is from the MoarVM’s elf grant proposal in 2020, confessed Lizzybel. “Ok, so tell me what are the deliverables of that project? I don’t have all day to look through grant proposals, you know”, said Santa.

Lizzybel peeked at her elfpad, took a deep breath and said: “Well, firstly: class and role implementations defining an document object model for the Raku language and its sub-languages, constructable and introspectable from within the Raku language. Secondly, the generation of QAST, the backend-independent intermediate representation, from RakuAST nodes, such that one can execute an AST. Thirdly, tests that cover the running of RakuAST nodes. And lastly, integration of RakuAST into the compilation process”. “Interesting”, said Santa, “and how much of that is done already?” “Enough to make more than 60% of the Rakudo test files pass completely, and more than 40% of the Raku test files pass completely if you use RakuAST for compiling your Raku code”, said Lizzybel.

Use it now

Santa continued what was now feeling like an interrogation. “So what use is RakuAST now?” “Well, it allows module developers to start playing with RakuAST features”, said Lizzybel. “But are you sure that RakuAST is stable enough for module developers to depend upon?”, Santa said, frowning. “No, the core elves are not sure enough about it yet, so that is why module developers will need to add a use experimental :rakuast to their code.” “Is there any documentation of these RakUAST classes?” “No, not really, but there are test files in the t/12-rakuast subdirectory. And there is a proof-of-concept of a module that converts sprintf format strings into executable code in the new Formatter class, that will be up to 30x faster”, Lizzybel blurted out.

“Ok, that’s a start”, said Santa, with a lighter shade of red on his cheeks.

Then Santa was distracted by the snow outside again and mumbled: “Are the reindeer prepared now?”

Day 17: How to clarify which parts of the documentation change

Using Pod::To::HTML2 a new custom FormatCode, D<> (D for deprecation), can be made to help with the Raku Documentation process. The new FormatCode should show a span of documentation that is deprecated in some way. This happens a lot when Rakudo is being upgraded. However, people using older versions of Rakudo need to understand what has changed, as well as what has been added. So it is not a good idea to delete older information, but it is not efficient to re-generate the entire Documentation suite for each new version of Rakudo.

Perhaps it would be good for a span of words to be highlighted in some way, and then for a deprecation string to appear when a mouse hovers over it.

For example D<function moveover( $x, $y, $z) { … } | Not expected to work in Rakudo-H > would be used to cover the function definition, and the deprecation string is after the |.

First install the module using zef install Raku::Pod::Render which will install Pod::To:HTML2. You will need at least version 4.2.0. A default directory is created with some distribution plugins. To see examples of the distribution plugins, type Rakudoc-to-html Example in an empty directory. Then serve the file Samples.html using some html serving system.

However, this is about making a bespoke plugin to implement a new Formatting Code. Pod::To::HTML2 interprets specified local sub-directories whose name does not contain the character _ after the first character of the name to contain plugin information.

Pod::To::HTML2 is a sub-class of ProcessedPod, so below I shall mention instances of ProcessedPod, though possibly I should be saying instances of Pod::To::HTML2.

Lets start with an empty directory ‘test’ (this article is written for Ubuntu linux, apologies for those on other systems that differ significantly).

Now we enter the directory and create a Rakudoc file (eg. ‘test-d.rakudoc’) with the following text:

    =begin pod

    This is some text to test a new format code. For example, D<function moveover( $x, $y, $z) { ... } | Not expected to work in Rakudo-H >
    should have a highlight and a deprecation string.

    =end pod


Now if you run Rakudoc-to-html test-d.rakudoc in the test/ directory you will get an html file test-d.html together with a directory asset_files containing some CSS files and the icon images. Note how ‘asset_files’ has a ‘_’ in it so that it will not be interpreted in the future as a plugin.

The file test-d.html can be served to a browser. I have the excellent Comma IDE, which allows a project-root-directory file to be served to a brower simply by opening it in that browser. I am sure everyone reading this article will have some favourite way of serving a file.

The FormatCode is not known to the Renderer, so the unknown-name template is triggered for a FormatCode.

To create a plugin, we need to:

  • tell the renderer that a custom Block is available. However, the Pod-Block for a FormatCode already exists, so we only need to provide a template for D. (I wrote about this in case you want to experiment with new Custom Blocks).
  • tell the renderer what HTML needs to be created for the FormatCode-D, that is provide a template.
  • provide Pod::To::HTML2 with a name for the CSS to be associated with the HTML containers, which we need to get the highlighting effect.

We create a sub-directory of test/ called deprecation-span. The name is not too important but it contains a ‘-‘ rather than ‘_’, though a name without ‘-‘ is possible.

Inside deprecation-span we create a file called config.raku. The name is important and a plugin must have a config.raku file. A config.raku is a Raku program that ends with a hash value. The following is a possible minimal content

%(
    :custom-raku(), # this key is mandatory, so we need it to exist and have Nil value
    :template-raku<deprecation-template.raku>,
    #:add-css<deprecate-span.css>,
)


You will see that this a hash in Raku idiom. One could call it RakuON by analogy with JSON. But you will also see that because it is normal Raku code, we can include comments as well. I have also commented-out the CSS line, as we will discuss CSS below.

The template is provided by ‘deprecate-template.raku’.

Although multiple templating engines, such as RakuClosure and Mustache, can also be used with ProcessedPod, I have not yet had enough time to develop the HTML2 plugins to use more than one. So I will use the default RakuClosureTemplates system here.

Basically all RakuClosure templates are contained in a Raku program that returns a Hash (like config.raku). The keys of the Hash are the names of the Pod-Block. The values for the keys are closures, viz., a sub that accepts two Hash parameters (conventionally %prm and %tml). The first (%prm) contains all the parameters passed by ProcessedPod to the template, and the second (%tml) contains all the templates known to ProcessedPod. So any template can call any template. (Currently, circularity is not detected). The sub must return a Str, which is inserted into the final html file. Plugins create a template Hash whose keys (new templates) are added to the default keys.

The ProcessedPod renderer passes (at least) two parameters to a template for a FormatCode in the %prm hash. These are contents, which is the first part of the FormatCode, and meta, which is the part after the |.

So we create a file called deprecate-template.raku with the following contents:

%(
    format-d => sub (%prm, %tml) { # note that the format letter is lower case
        '<span class="raku-deprecation" title="' ~ %prm<meta> ~ '">'
        ~ %prm<contents>
        ~ '</span>'
    },
)


We also have to tell that there is a new plugin, so we run

Rakudoc-to-html --add-plugins='deprecate-span' test-d.rakudoc

(add-plugins can take a space delimited list of plugins)

Now we have the correct text without an error, and if we put a mouse over the word ‘function’, we will get the deprecation string. In order to highlight the span so that the user can be prompted to hover a mouse over the text, we need to have some CSS.

By way of example, put the following CSS in the file deprecate-span.css (remember the HTML class raku-deprecation was included in the template):

.raku-deprecation {
	background-color: bisque;
	border-block-color: blue;
	border-width: 1px;
	border-style: dashed;
	border-radius: 5px;
}


We need to uncomment the :add-css line in config.raku. deprecate-span.css is assumed to be a valid CSS file and it will be transferred to test/asset_files/css/ by Pod::To::HTML2. Pod::To::HTML2 also creates the stylesheet reference in the HTML header so that it is served too.

Run the file again

Rakudoc-to-html --add-plugins='deprecate-span' test-d.rakudoc

and the CSS has been added. Obviously, a lot more CSS tricks can be played, but I just wanted to show some CSS.

There is much more to the plugin process, including the ability to add JQuery and images. In order to examine copies of the distributed plugins into your local test directory, run the following in that directory.

Rakudoc-to-html get-local


Rendered from newplugin at 2022-12-12T22:11:57Z