Day 24: He’s making a list… (part 2)

In our last edition, we learned about some of the work that Santa’s elves put into automating how they make their lists. What you probably didn’t know is that the elves stay on top of the latest and greatest technology. Being well-known avid Raku programmers, the elves were excited to hear about RakuAST and decided to see how they might be able to use it. One of the elves decided to rework the list formatting code to use RakuAST. What follows is the story of how she upgraded their current technology to use RakuAST.

Background

The current code that the elves had is fairly straight forward (check out part one for a full explanation)

sub format-list(
  +@items,
  :$language 'en',
  :$type = 'and',
  :$length = 'standard'
) {
    state %formatters;
    my $code = "$language/$type/$length";
 	 
    # Get a formatter, generate if it's not been requested before
    my &formatter = %cache{$code} // %cache{$code} =
      generate-list-formatter($language, $type, $length);
 	 
    formatter @items;
}
 	 
sub generate-list-formatter($language, $type, $length --> Sub ) {
    # Get CLDR information
    my $format = cldr{$language}.list-format{$type}{$length};
    my ($start, $middle, $end, $two) =
      $format<start middle end two>.map: *.substr(3,*-3).raku;
 	 
    # Generate code
    my $code = q:s:to/FORMATCODE/;
        sub format-list(+@items) {
            if @items > 2 {
                @items[0]
                  ~ $start
                  ~ @items[1..*-2].join($middle)
                  ~ $end
                  ~ @items[*-1]
            }
            elsif @items == 2 {
                @items[0] ~ $two ~ @items[1]
            }
            elsif @items == 1 {
                @items[0]
            }
            else {
                ''
            }
        }
    FORMATCODE
 	 
    # compile and return
    use MONKEY-SEE-NO-EVAL;
    EVAL $code
}

While the caching technique is rudimentary and technically not thread-safe, it works (a different elf will probably revisit the code to make it so). Now, when creating all the lists for, say, children in Georgia, the data for Georgian list formatters in CLDR will only need to be accessed a single time. For the next half a million or so calls, the code will be run practically as fast as if it had been hard coded (since, in effect, it has been).

The problem is how the generate-list-formatter code works. The code block uses a heredoc-style :to string, but it’s interpolated. There are numerous ways to accomplish this but all of them require having to use proper escapes. That’s…. risky.

Another elf, having seen the performance improvements that this new EVAL code brought, wanted to find a way to avoid the risky string evaluation. She had heard about the new RakuAST and decided to give it a whirl. While it initially looked more daunting, she quickly realized that RakuAST was very powerful.

What is RakuAST

RakuAST is an object-based representation of Raku’s abstract syntax tree, or roughly what you might get if you parsed Raku’s code into its individual elements. For instance, a string literal might be represented as 'foo' in code, but once parsed, becomes a string literal. That string literal, by the way, can be created by using RakuAST::StrLiteral.new(…). Remember how the elf had to worry about how the string might be interpolated? By creating a the string literal directly via a RakuAST node, that whole process is safely bypassed. No RakuAST::StrLiteral node can be created that will result in a string injection!

Every single construct in the Raku language has an associated RakuAST node. When creating nodes, you might frequently pass in another node, which means you can build up code objects in a piece-by-piece fashion, and again, without ever worrying about string interpolation, escaping, or injection attacks.

So let’s see how the elf eventually created the safer RakuAST version of the formatter method.

The elf works her AST off

To ease her transition into RakuAST, the elf decided to go from the simplest to the most complex part of the code. The simplest is the value for the final else block:

my $none = RakuAST::StrLiteral.new(''); 

Okay. That was easy. Now she wanted to tackle the single element value. In the original code, that was @list.head. Although we don’t normally think of it as such, . is a special infix for method calling. Operators can be used creating an RakuAST::Apply___fix node, where ___ is the type of operator. Depending on the node, there are different arguments. In the case of RakuAST::ApplyPostfix, the arguments are operand (the list), and postfix which is the actual operator. These aren’t as simple as typing in some plain text, but when looking at the code the elf came up with, it’s quite clear what’s going on:

my $operand = RakuAST::Var::Lexical.new('@list');
my $postfix = RakuAST::Call::Method.new(
  name => RakuAST::Name.from-identifier('head')
);
my $one = RakuAST::ApplyPostfix.new(:$operand, :$postfix) 

The operand isn’t a literal, but a variable. Specifically, it’s a lexical variable, so we create a node that will reference it. The call method operator needs a name as well, so we do that as well.

This involves a lot of assignment statements. Sometimes that can be helpful, but for something this simple, the elf decided it was easier to write it as one “line”:

my $one = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('head')
  )
);

Alright, so the first two cases are done. How might she create the result for when the list has two items? Almost exactly like the last time, except now she’d provide an argument. While you might think it would be as simple as adding args => RakuAST::StrLiteral($two-infix), it’s actually a tiny bit more complicated because in Raku, argument lists are handled somewhat specially, so we actually need a RakuAST::ArgList node. So the equivalent of @list.join($two-infix) is

my $two = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('join'),
    args => RakuAST::ArgList.new(
      RakuAST::StrLiteral.new($two-infix)
    )
  )
); 	 

The RakuAST::ArgList takes in a list of arguments — be they positional or named (named applied by way of a RakuAST::FatComma).

Finally, the elf decided to tackle what likely would be the most complicated bit: the code for 3 or more items. This code makes multiple method calls (including a chained one), as well as combining everything with a chained infix operator.

The method calls were fairly straightforward, but she thought about what the multiple ~ operators would be handled. As it turns out, it would actually require being set up as if (($a ~ $b) ~ $c) ~ $d, etc., and the elf didn’t really like the idea of having ultimately intending her code that much. She also thought about just using join on a list that she could make, but she already knew how to do method calls, so she thought she’d try something cool: reduction operators (think [~] $a, $b, $c, $d for the previous). This uses the RakuAST::Term::Reduce node that takes a simple list of arguments. For the * - 2 syntax, to avoid getting too crazy, she treated it as if it had been written as the functionally identical @list - 2.

Becaused that reduction bit has some many elements, she ending up breaking things into pieces: the initial item, the special first infix, a merged set of the second to penultimate items joined with the common infix, the special final infix, and the final item. For a list like [1,2,3,4,5] in English, that amounts to 1 (initial item), , (first infix), 2, 3, 4 (second to penultimate, joined with , ), , and (final infix) and 5 (final item). In other languages, the first and repeated infixes may be different, and in others, all three may be identical.

# @list.head
my $more-first-item = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('head')
  )
);

# @list[1, * - 2].join($more-middle-infix)
my $more-mid-items = RakuAST::ApplyPostfix.new(
  # @list[1, @list - 2
  operand => RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Postcircumfix::ArrayIndex.new(
      # (1 .. @list - 2)
      RakuAST::SemiList.new(
        RakuAST::ApplyInfix.new(
          left => RakuAST::IntLiteral.new(1),
          infix => RakuAST::Infix.new('..'),
          # @list - 2
          right => RakuAST::ApplyInfix.new(
            left => RakuAST::Var::Lexical.new('@list'),
            infix => RakuAST::Infix.new('-'),
            right => RakuAST::IntLiteral.new(2)
          )
        )
      )
    )
  ),
  # .join($more-middle-infix)
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('join'),
    args => RakuAST::ArgList.new(
      RakuAST::StrLiteral.new($more-middle-infix)
    )
  )
);
 
# @list.tail
my $more-final-item = RakuAST::ApplyPostfix.new(
  operand => RakuAST::Var::Lexical.new('@list'),
  postfix => RakuAST::Call::Method.new(
    name => RakuAST::Name.from-identifier('tail')
  )
);
 	 
# [~] ...
my $more = RakuAST::Term::Reduce.new(
  infix => RakuAST::Infix.new('~'),
  args => RakuAST::ArgList.new(
    $more-first-item,
    RakuAST::StrLiteral.new($more-first-infix),
    $more-mid-items,
    RakuAST::StrLiteral.new($more-final-infix),
    $more-final-item,
  )
);

As one can note, as RakuAST code starts getting more complex, it can be extremely helpful to store interim pieces into variables. For complex programs, some RakuAST users will create functions that do some of the verbose stuff for them. For instance, one might get tired of the code for an infix, and write a sub like

sub rast-infix($left, $infix, $right) {
    RakuAST::ApplyInfix.new:
      left => $left,
      infix => RakuAST::Infix.new($infix),
      right => $right
}

to enable code like rast-infix($value, '+', $value) which ends up being much less bulky. Depending on what they’re doing, they might make a sub just for adding two values, or maybe making a list more compactly.

In any case, the hard working elf had now programmatically defined all of the formatter code. All that was left was for her to piece together the number logic and she’d be done. That logic was, in practice, quite simple:

if @list > 2 { $more }
elsif @list == 2 { $two }
elsif @list == 1 { $one }
else { $none } 

In practice, there was still a bit of a learning curve. Why? As it turns out, the [els]if statements are actually officially expressions, and need to be wrapped up in an expression block. That’s easy enough, she could just use RakuAST::Statement::Expression. Her conditions end up being coded as

# @list > 2
my $more-than-two = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('>'),
    right => RakuAST::IntLiteral.new(2)
  )
);
 	 
# @list == 2
my $exactly-two = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('=='),
    right => RakuAST::IntLiteral.new(2)
  )
);
 	 
# @list == 1
my $exactly-one = RakuAST::Statement::Expression.new(
  expression => RakuAST::ApplyInfix.new(
    left => RakuAST::Var::Lexical.new('@list'),
    infix => RakuAST::Infix.new('=='),
    right => RakuAST::IntLiteral.new(1)
  )
);	 

That was simple enough. But now sure realized that the then statements were not just the simple code she had made, but were actually a sort of block! She would need to wrap them with a RakuAST::Block. A block has a required RakuAST::Blockoid element, which in turn has a required RakuAST::Statement::List element, and this in turn will contain a list of statements, the simplest of which is a RakuAST::Statement::Expression that she had already seen. She decided to try out the technique of writing a helper sub to do this:

sub wrap-in-block($expression) {
    RakuAST::Block.new(
      body => RakuAST::Blockoid.new(
        RakuAST::StatementList.new(
          RakuAST::Statement::Expression.new(:$expression)
        )
      )
    )
}
 	 
$more = wrap-in-block $more;
$two  = wrap-in-block $two;
$one  = wrap-in-block $one;
$none = wrap-in-block $none; 

Phew, that was a pretty easy way to handle some otherwise very verbose coding. Who knew Raku hid away so much complex stuff in such simple syntax?! Now that she had both the if and then statements finished, she was ready to finish the full conditional:

my $if = RakuAST::Statement::If.new(
  condition => $more-than-two,
  then => $more,
  elsifs => [
    RakuAST::Statement::Elsif.new(
      condition => $exactly-two,
      then => $two
    ),
    RakuAST::Statement::Elsif.new(
      condition => $exactly-one,
      then => $one
    )
  ],
  else => $none
); 

All that was left was for her to wrap it up into a Routine and she’d be ready to go! She decided to put it into a PointyBlock, since that’s a sort of anonymous function that still takes arguments. Her fully-wrapped code block ended up as:

my $code = RakuAST::PointyBlock.new(
  signature => RakuAST::Signature.new(
    parameters => (
      RakuAST::Parameter.new(
        target => RakuAST::ParameterTarget::Var.new('@list'),
 	slurpy => RakuAST::Parameter::Slurpy::SingleArgument
      ),
    ),
  ),
  body => RakuAST::Blockoid.new(
    RakuAST::StatementList.new(
      RakuAST::Statement::Expression.new(
        expression => $if
      )
    )
  )
); 

Working with RakuAST, she really got a feel for how things worked internally in Raku. It was easy to see that a runnable code block like a pointy block consisted of a signature and a body. That signature had a list of parameters, and the body a list of statements. Seems obvious, but it can be enlightening to see it spread out like she had it.

The final step was for her actually evaluate this (now much safer!) code. For that, nothing changed. In fact, the entire rest of her block was simply

sub generate-list-formatter($language, $type, $length) {
    use Intl::CLDR;
    my $pattern = cldr{$lang}.list-patterns{$type}{$length};
    my $two-infix = $pattern.two.substr: 3, *-3;
    my $more-first-infix = $pattern.start.substr: 3, *-3;
    my $more-middle-infix = $pattern.middle.substr: 3, *-3;
    my $more-final-infix = $pattern.end.substr: 3, *-3;
 	 
    ...
 	 
    use MONKEY-SEE-NO-EVAL;
    EVAL $code
}

Was her code necessarily faster than the older method? Not necessarily. It didn’t require a parse phase, which probably saved a bit, but once compiled, the speed would be the same.

So why would she bother doing all this extra work when some string manipulation could have produced the same result? A number of reasons. To begin, she learned the innards of RakuAST, which helped her learn the innards of Raku a bit better. But for us non-elf programmers, RakuAST is important for many other reasons. For instance, at every stage of this process, everything was fully introspectable! If your mind jumped to writing optimizers, besides being a coding masochist, you’ve actually thought about something that will likely come about.

Macros is another big feature that’s coming in Raku and will rely heavily on RakuAST. Rather than just do text replacement in the code like macros in many other languages, macros will run off of RakuAST nodes. This means an errant quote will never cause problems, and likely enable far more complex macro development. DSL developers can seamlessly integrate with Raku by just compiling down to RakuAST.

The future

So what is the status of RakuAST? When can you use it? As of today, you will need to build the most recent main branch of Rakudo to use it. Then, in your code, include the statement use experimental :rakuast;. Yours truly will be updating a number of his formatting modules to use RakuAST very shortly which will make them far more maintainable and thus easier to add new features. For more updates on the progress of RakuAST, check out the Rakudo Weekly, where Elizabeth Mattijsen gives regular updates on RakuAST and all things Raku.

3 thoughts on “Day 24: He’s making a list… (part 2)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: