RFC 5, by Michael J. Mathews: Multiline comments

This is the first RFC proposed related to documentation. It asks for a common feature in most of the modern programming languages: multiline comments.

The problem of not having multi-line comments is quite obvious: if you need to comment a large chunk of code, you need to manually insert a # symbol at the beginning of every line (in Raku). This can be incredibly tedious if you do not have, for instance, a text editor to do this with a shortcut or similar. This practice is very common in large code bases. For that reason, Michael refers to C++ and Java as

popular languages that were designed from the outset as being useful for large projects, implementing both single line and multiline comments

In those languages you can type comments as follows:

// single line of code

/*
 Several lines of code
*/

But, in addition, in Java you have a special multiline comment syntax 1 for writing documentation:

/**
* Here you can write the doc!
*
*/

A lot of people proposed POD as a solution to this problem, but Michael lists some inconvenients:

  • “it’s not intuitive”: given that POD is only used by Perl, people coming from different languages will face some struggles learning an entire new syntax.

From my point of view, this not as big a problem since POD6 syntax is quite simple and it’s well documented. In addition, it is quite intuitive for newcomers: if you want a header, you use =head1, if you want italics, you use I<> and so on.

  • “it’s not documentation”: this one is still true. The main problem is that when you want to comment a big chunk of code, that’s probably not documentation, so using =begin pod ... =end pod it’s a little weird.
  • “it doesn’t encourage consistency”: another problem of POD is that you can use arbitrary terms in its syntax:
    =begin ARBITRARYTEXT 
    ... 
    =end ARBITRARYTEXT

    While this behavior gives us a lot freedom, it also complicates consistency across different projects and users.

After some discussion, Perl chose POD for implementing multiline comments. Nonetheless, Michael proposal was taken into account and Raku supports multiline comments similar to those of C++ and Java, but with a slightly different syntax:

#`[
Raku is a large-project-friendly
language too!
]
say ":D";

And as a curiosity, Raku has embedded comments, that is:

if #`( embedded comment ) True {
    say "Raku is awesome";
}

In the end, as a modern, 100-year language, Raku gives you more than one way to do it, so choose whatever fits you best!


  1. It’s not really a multiline comment because you also need to type the * symbol at the beginning of every line.

RFC 225: Superpositions (aka Junctions)

Damian Conway is one of those names in the Perl and Raku world that almost doesn’t need explaining. He is one of the most prolific contributors to CPAN and was foundational in the design of Raku (then Perl 6). One of his more interesting proposals came in RFC225 on Superpositions, which suggested making his Perl Quantum::Superposition‘s features available in the core of the language.

What is a Superposition?¹

In the quantum world, there are measurable things that can exist in multiple states — simultaneously — until the point in time in which we measure them. For computer scientists, perhaps the most salient application of this is in qubits which, as a core element of quantum computing, threaten to destroy encryption as we know it, if quantum supremacy is borne out.

At the end of the day, though, for us it means being able to treat multiple values as if they were a single value so long as never actually need there to only be one, at which point we get a single state from them.

The Perl Implementation

In the original implementation, Dr. Conway adds two new operators, all and any. These converted a list of values into a single scalar value. How was this different from using a list or array? Consider the following Perl/Raku code:

my @foo = (0, 1, 2, 3, 4, 5);

We can easily access each of the values by using array notation:

print @names[0]; # 0
print @names[1]; # 1
print @names[2]; # 2

But what if we wanted to do stuff to this list of numbers? That’s a bit trickier. Functional programmers would probably say “But you have map!”. That’s true, of course. If I wanted to double everything, I could say

@foo = map { $_ * 2}  @foo; # Perl
@foo = map { $_ * 2}, @foo; # Raku

But it could also be nice if I could just say

@foo *= 2;

This is where the superposition can be helpful. Now imagine we have another array and wanted to add it to our now doubled set of values in @foo

my @bar = (0,20,40,60,80,100);
@foobar = @foo + @bar;          # (12); wait what?  Recall that arrays in numeric context are the number of elements, or 6 here.

Your instinctive reaction might be to say that we’d want to end up with (0,22,44,66,88,110) which is simple enough to handle in a basic map or for loop (using the zip operator, Raku can do this simply as @foo Z+ @bar). But remember what a superposition means: anything done happens to all the values, so each value in @foo needs to be handled with each value in @bar, which requires at least two loops if done via map or for (the cross operator in Raku can do this simply as @foo X+ @bar). We actually want (0, 2, 4, 6, 8, 10, 20, 22, 24, 26, 28, 30, 40, 42, 44, 46, 48, 50 … ). More difficult, then, would be to somehow compare this value:

@foobar > 10;

There is no map method we can attach to @foobar to check its values against 10, we’d need to instead map the > 10 into @foobar. But by using superpositioning, we can painless do all of the above with a single use of map, for, or anything else that generates line noise:

use Quantum::Superposition;
my $foo = any (0, 1, 2, 3, 4, 5);    # superposition of 0..5
$foo *= 2;                           # superposition of 0,2,4,6,8,10
my $bar = any (0,20,40,60,80,100);
my $foobar = $foo + $bar;            # superposition of 0,2,4,6,8,20,22,24,26…
$foobar > 10;                        # True
$foobar > 200;                       # False
$foobar < 50;                        # True
$foobar < 0;                         # False

In fact, comparison operators are where the power of superpositions really shine. Instead of checking if a string response is an an array of acceptable responses, or using a hash

The Raku proposal

In the original proposal, there were two types of superpositions possible: all and any. These were proposed to work exactly as described above (creating a single scalar value out of a list of values), with their most useful feature being evident when interpreted in a boolean context. For example, in the code

my $numbers = all 1,3,5,7,9;
say "Tiny"  if $numbers < 10;     # Tiny
say "Prime" if $numbers.is-prime; # (no output)

For those wishing to obtain the values, he proposed the using the sub eigenstates, which would retrieve the states without forcing it to collapse to a single one. The rest of the RFC argues why superpositions should not be left in module space, as even the Dr. Conway’s work had limitations that he himself readily admitted — namely, interacting with everything that assumes a single value for a scalar and (auto)threading. The former should be fairly obvious why it would be difficult for the Quantum::Superposition module to work perfectly outside of core, because “the use of superpositions changes the nature of subroutine and operator invocations that have superpositions as arguments”.² As well, if we had a superposition of a million values, doing each operation one by one on computers with multiple processors seems silly: it should be possible to take advantage of the multiple processors. While this seems like an obvious proposition today, we must recall the multicore processors were simply not common in the consumer market when the proposal was made. (Intel’s Pentium D chips didn’t arrive until 2005, IBM’s PowerPC970 MP in 2002.) By placing it in core, things can just work as intended and, in the rare event that a module author cares about receiving superimposed values, they could provide special support.

The Raku implementation

For the most part, RFC 225 was well received and expanded in scope. The most obvious change is the name. In the final implementation, Raku calls these superimposed values junctions. But on a practical level, two additional keywords were added, none and one which provide more options to those using the junctions.³A wildly different — and useful — option was added to provide syntax to create the junctions. Instead of using any 1,2,3, one can also write 1 | 2 | 3, and in lieu of all 1,2,3 it’s possible to write 1 & 2 & 3. Different situations might give rise to using one or the other form, which aids the Perl & Raku philosophy of TIMTOWTDI.

One feature that did not make the cut was the ability to introspect the current states. As late as 2009, it seems it was still planned (based on this issue), but at some point, it was taken out, probably because the way that junctions work means that any methods called on them ought to be fully passed through to their superimposed values, so it would be weird to have a single method that didn’t. Nonetheless, by abusing some of the multithreading that Raku does with junctions, it’s still possible if one really wants to do it:

sub eigenstates(Mu $j) {
    my @states;
    -> Any $s { @states.push: $s }.($j);
    @states;
}

Conclusion

Junctions are, despite their internal complexity and rarity in programming languages are something that are so well thought out and integrated into common Raku coding styles that most use them without any thought. Who hasn’t written a signature with a parameter like $foo where Int|Rat or @bar where .all < 256? Who prefers

if $command eq 'quit' || $command eq 'exit'

to these versions? (because TIMTOWTDI)

if $command eq 'quit'|'exit'
if $command eq any <quit exit>
if $command eq @bye.any

None of these are implemented with syntactical sugar for conditionals, though it may seem otherwise. Instead, at their core, is a junction. Dr. Conway’s RFC 225 is a prime example of a modest proposal that is so simultaneously both crazy and natural that, while it fundamentally changed how we wrote code, we haven’t even realized it.


  1. I am not a physicist, much less a quantum one. I probably made mistakes here. /me is not sorry.
  2. Maybe there’s a super convoluted way to still pull it off, but to my knowledge, he’s the only person who wrote an entire regex to parse Perl itself in order to add a few new keywords, so if he deems it not possible… I’m gonna go with it’s not possible.
  3. Perhaps in the future others could be designed, such as at-least-half. The sky’s the limit after all in Raku.

Cover image by Sharon Hahn Darlin, licensed under CC-BY 2.0

RFC 168, by Johan Vromans: Built-in functions should be functions

Proposed on 27 August 2000, frozen on 20 September 2000, which was a generalization of RFC 26: Named operators versus functions proposed on 4 August 2000, frozen on 28 August 2000, also by Johan Vromans.

Johan’s proposal was to completely obliterate the difference between built-in functions, such as abs, and functions defined by the user. In Perl, abs can be called both as a prefix operator (without parentheses), as well as a function taking a single argument.

You see, Perl has this concept of built-in functions that are slightly different from “normal” subroutines for performance reasons. In Perl, as in Raku, the actual name of a subroutine, is prefixed with an ‘&‘. In Perl, you can take a reference to a subroutine with ‘\‘, but that doesn’t work for built-in functions.

Nowadays, in Raku, the difference between a subroutine taking a single positional argument, and a built-in prefix operator whose name is acceptable as an identifier, is already minimal. Well, actually absent. Suppose we want to define a prefix operator foo that has the same semantics as abs:

sub foo(Numeric:D $value) {
    $value < 0 ?? -$value !! $value
}

say abs -42;  # 42
say foo -42;  # 42

say abs(-42); # 42
say foo(-42); # 42

You can’t really see a difference, now can you? Well, the reason is simple: in Raku, there is no real difference between the foo subroutine, and the abs prefix operator. They’re both just subroutines: just look at the definition of the abs function for Real numbers.

But how does that function for infix operators? Those aren’t surely subroutines as well in Raku? How can they be? Something like “+” is not a valid identifier, so you cannot define a subroutine with it?

The genius in the process from the RFC to the implementation in Raku, has really been the idea to give a subroutine that represents an infix operator, a specially formatted name. In the case of infix + operator, the subroutine is known by the name infix:<+>. And if you look at its definition, you’ll see that it is actually quite simple: the left hand side of the infix operator becomes the first positional argument, and the right hand side the second positional argument. So something like:

say 42 + 666;

is really just syntactic sugar for:

say infix:<+>(42, 666);

Does this apply to all built-in operators in Raku? Well, almost. Some operators, such as ||, or, && and and are short-circuiting. This means that the value on the right hand side, might not be evaluated if the left hand side has a certain value.

A simple example using the say function (which always returns True):

say "foo" or say "bar"; # foo

Because the infix or operator sees that its left hand side is already True, it will not bother to evaluate the right hand side, and thus will not print “bar”. There is currently no way in Raku to mimic this short-circuiting behaviour in “ordinary” subroutines. But this will change when macro’s will finally also become first-class citizens in Raku land. Which is expected to be happening in the coming year as part of Jonathan Worthington‘s work on the RakuAST grant.

Going back to the original RFC, it also mentions:

In particular, it is desired that every built-in
- can be overridden by a user defined subroutine;
- can have a reference taken;
- has a useful prototype.

So, let’s check that those points:

can be overridden by a used defined subroutine

OK, so infix operators have a special name. So what happens if I declare a subroutine with that name? Well, let’s try:

sub infix:<+>(\a, \b) { a + b }
say 42 + 666;

Hmmm… that doesn’t show anything, that just hangs! Well, yeah, because we basically have a case of a subroutine here calling itself without ever returning!

This code example eats about 1GB of memory per second, so don’t do that too long unless you have a lot of memory available!

The easiest fix would be to not use the infix ‘+‘ operator in our version:

sub infix:<+>(\a, \b) { sum a, b }
say 42 + 666;  # 708

But what if we want to refer to original infix:<+> logic? It’s just a subroutine after all! But where does that subroutine live? Well, in the core of course! And for looking up things in the core, you use the CORE:: PseudoStash:

sub infix:<+>(\a, \b) {
    say "plussing";
    CORE::<&infix:<+>>(a, b)
}
say 42 + 666; # plussing\n708

You look in the CORE:: pseudostash for the full name of the infix operator: CORE::<&infix:<+>> will then give you the subroutine object of the core’s infix + operator, and you can call that as a subroutine with two parameters.

So that part of the RFC has been implemented!

can have a reference taken

For the infix + operator, that would be &infix:<+>, as basically is shown in the example above. You could actually store that in a variable, and use that later in an expression:

my $foo = &infix:<+>;
say $foo(42,666);  # 708

Note that contrary to Perl, you do not need to take a reference in Raku. Since everything in Raku is an object, &foo and &infix:<+> are just objects as well. You can just use them as they are. So literally this part of the RFC could never be implemented because Raku does not have reference. But for the use case, which is obtaining something that can be called, the RFC has also been implemented.

has a useful prototype

Perl’s prototypes basically morphed into Raku’s Signatures. But that’s at least one blog post all by itself. So for now, we just say that the “prototypes” of Perl in 2000 turned into signatures in Raku. And since you can ask for a subroutine’s signature:

sub foo(\a, \b) { }
say &foo.signature;  # (\a, \b)

You can also do that for the infix + operator:

say &infix:<+>.signature;  # ($?, $?, *%)

Hmmm… that looks different? Well, yes, it does a bit, but what is mainly different is that both positional parameters are optional. And that any named parameters will also be accepted. As to why that is, that’s really the topic of a yet another blog post about meta-operators. Which we’ll also leave for another time.

Conclusion

RFC’s 168 and 26 have been implemented completely, although maybe not in the way the original RFC’s envisioned. In a way that nowadays just feels very natural. Which allows us to build further, on the shoulders of giants!

RFC 145, by Eric J. Roode: Brace-matching for Perl Regular Expressions

Problem and proposal

The RFC 145 calls for a new regex mechanism to assist in matching paired characters like parentheses, ensuring that they are balanced. There are many “paired characters” in more or less daily use: (), [], {}, <>, «», "", '', depending on your local even »«, or in the fancy world of Unicode additionally ⟦⟧ and many, many more. In this article I will take up the RFC’s title and call all of them “braces”.

For example, consider the string ([b - (a + 1)] * 7). We might wish to extract all the subformulas

  • [b - (a + 1)] * 7,
  • b - (a + 1),
  • a + 1

from it, all of which are surrounded by a matching pair of braces using a global match. The reader is invited to try to write such a regex now.

The RFC author Eric Roode notes that this was still quite difficult in Perl in the year 2000. The task splits into two parts:

  1. Determining for an opening bracket what is its closing counterpart.
  2. Keeping track of the nesting levels and matching braces at each level.

The first subtask becomes hairy in a regex when there are multiple options for the opening bracket. The second subtask is hard for a more profound reason which goes by the name of “Dyck language“. The Dyck language is the set of all strings of properly paired parentheses (with hypothetical contents between them erased). It is the prototypical example of a language in the computer-science sense which is not regular but still context-free, meaning that it somehow needs a stack to keep track of nesting levels. Of course, regexes are more powerful than computer-scientific regular expressions but this fact may still justify why this is a difficult thing to do. Eric Roode recognized the gap between how easy this very common task in parsing structured data should be and how easy it is and wrote an RFC.

He proposed a pragma use matchpairs to solve subtask № 1 by providing a map from opening to closing braces. Pragmas are activated in a lexical scope and influence all regex matches in it. For subtask № 2 two new regex metacharacters were proposed, \m and \M for matching and remembering corresponding braces. Using these hooks, the nesting level business is offloaded onto the regex engine.

Spec and solution

RFC 145 is marked “developing”, meaning that it was not fully addressed in the Perl 6, and now Raku, specification. (Apocalypse 5 on pattern matching includes a response to RFC 145.) But there have been related improvements which I am going to use in this section to show how the problem posed in the beginning might be handled in Raku today.

The idea of using a pragma to set up a table of valid braces and then using “brace” regex metacharacters was not implemented, but the regex language was to be redesigned anyway and the designers extrapolated from brace matching and created a new regex operator for nesting structures, the tilde. This operator is used like this:

anon regex { '(' ~ ')' <body> }

and it achieves two things: it transposes body and closing brace so that the two delimiters are close to each other, even when <body> is long, and it sets up error reporting for when the closing brace was not found.

We can use this new feature to slightly improve the regex structure and get error reporting for free, but it does not keep track of nesting levels of the parentheses and it does not compute the closing brace for us if there had been multiple options for the opening one.

To compute the closing brace, it would suffice to have a way to capture the opening brace and pass it to a function whose return value is dynamically interpolated into the regex. This is now easy in Raku regexes and grammars:

grammar Formula {
    # Registry of understood braces.
    constant %braces =
        '(' => ')',
        '[' => ']',
        '{' => '}',
    ;

    # A parametric token which matches the closing brace
    # corresponding to its argument.
    token closing ($opening) {
        "%braces{$opening}"
    }

    rule braced {
        $<opening>=@(%braces.keys) ~ <closing($<opening>)>
          [ <expr> {} ]
    }

    rule expr {
        [ <:Letter>+ || <:Number>+ || <braced> ]+ % <[+*/-]>
    }
}

The crucial part is rule braced.¹ We capture the opening brace and then later ask for its corresponding closing brace from a lookup in the %braces map.² The @(%braces.keys) interpolation of a list invokes longest-token matching, so it will DWIM when multiple braces with overlapping prefixes are present.

Notice that the mutually recursive use of the <expr> and <braced> rules ensures correct nesting of braces without needing a dedicated gear for this in the regex engine. It falls out of Raku’s improved regex structuring and reusing facilities. It is time for a test:

grammar Formula { … }
sub braced-subexprs ($expr) { … }

braced-subexprs Q|([b - (a + 1)] * 7)|;
-- ([b - (a + 1)] * 7) ---------------------------------------------------------
Braces: ( * ) ||| Subexpr: a + 1
Braces: [ * ] ||| Subexpr: b - (a + 1)
Braces: ( * ) ||| Subexpr: [b - (a + 1)] * 7

Summary

In summary, brace matching is obviously useful in parsing structured data. It was proposed by Eric Roode to make this simple in Perl 6 / Raku. Although the feature was not implemented in the proposed form, the task has indeed become easier to accomplish and the code much easier to read, notably due to the new regex syntax and grammar support.

Encore!

If, like me, you are slightly bothered by the static brace table but are fine with heuristics, then the Unicode Consortium may be an unexpected ally. The Unicode Bidi_Mirroring_Glyph property gives hints about bidirectional writing, that is putting text on the screen when multiple scripts are involved, some of which write left-to-right and others right-to-left. Raku has built-in support for Unicode properties and we can use this one to let the Unicode Consortium pick closing braces for us:

    sub unicode-mirror ($_) {
        join '', .comb.reverse.map: {
            .uniprop('Bidi_Mirroring_Glyph')
                or .self
        }
    }

    token closing ($opening) {
        "{ unicode-mirror($opening) }"
    }

    regex braced {
        :sigspace
        $<opening>=<:Symbol + :Punctuation>+ ~ <closing($<opening>)>
        [ <expr> {} ]
    }

The &unicode-mirror heuristic splits the argument into characters, reverses their order and then either picks its mirroring glyph, if one is defined, or leaves the character as-is, then reassembles them into a string. This function successfully turns <{ into }>, for example.

braced was tweaked in two regards: it accepts any sequence of symbols and punctuation as opening braces now and it has been turned into a regex for full backtracking power when it is too greedy in consuming opening braces.

With these tweaks, we can go nuts and have the grammar do free association and match everything that “looks like a brace pair”:

-- ([b - (a + 1)] * 7) ---------------------------------------------------------
Braces: ( * ) ||| Subexpr: a + 1
Braces: [ * ] ||| Subexpr: b - (a + 1)
Braces: ( * ) ||| Subexpr: [b - (a + 1)] * 7

-- (=^123^=) -------------------------------------------------------------------
Braces: (=^ * ^=) ||| Subexpr: 123

-- <<<123>> --------------------------------------------------------------------
FAILED

-- >123< -----------------------------------------------------------------------
Braces: > * < ||| Subexpr: 123

-- >123> -----------------------------------------------------------------------
FAILED

-- <{ (a + <b>) / !c! / e * »~d~« }> -------------------------------------------
Braces: < * > ||| Subexpr: b
Braces: ( * ) ||| Subexpr: a + <b>
Braces: ! * ! ||| Subexpr: c
Braces: »~ * ~« ||| Subexpr: d
Braces: <{ * }> ||| Subexpr: (a + <b>) / !c! / e * »~d~«

Footnotes

The function used to report braced subexpressions is this:

sub braced-subexprs ($expr) {
    # Get all submatches of the C<braced> subrule.
    class BracedCollector {
        has @.braced-subexprs;

        method braced ($/) {
            push @!braced-subexprs, $/
        }

        method braced-subexprs {
            @!braced-subexprs.unique(as => *.pos)
        }
    }

    say "-- $expr ", '-' x (76 - $expr.chars);

    my BracedCollector $collect .= new;
    say "FAILED" and return
        unless Formula.parse($expr, :rule<expr>, :actions($collect));

    for $collect.braced-subexprs -> $/ {
        say "Braces: $<opening> * $<closing> ||| Subexpr: $<expr>";
    }
}

¹ In case you are wondering about the use of an empty block in [ <expr> {} ], this is due to an implementation detail in Rakudo’s regex engine which does not make the capture $<opening> available to a later subrule closing unless it is forced to. The empty block is one way to force it; cf. RT#111518 and DOC#3478.

² The essential feature of interpolating back the return value of a function call closing $<op> which may depend on previous captures was added, to the best of my knowledge, also around the year 2000 (so about the time this RFC was posted), to Perl 5.6, in this case with the spelling (??{closing $+{op}}).

RFC 137: Perl OO should not be fundamentally changed.

Now, as you have read the title and already stopped laughing… Er, not all of you yet? Ok, I’ll give you another minute…

Good, let’s be serious now. RFC 137 was written by Damian Conway. Yes, the title too. No, I’m serious! Check it yourself! And then also read other RFCs from language-objects category. Turns out, it was the common intention back then: don’t break things beyond necessary, keep everything as backward compatible as possible.

A familiar stance, isn’t it?

I chose this RFC not over its title but because it might be considered as the source of the river we now call the Raku OO model.

Let’s look closer into the RFC text. To be frank, this gonna be second time only as I read it. Gosh, three weeks ago I even had no idea Perl6 started with RFCs! My long-time belief was the synopses were the first to come. And since even they are now pretty much outdated in many places, studying RFCs is like studying the first stone tools of Homo Habilis: there is not much in common with what we use nowadays, but how great is it to see ways human mind adapt and improve its own ideas! So, let’s get back into our archeological excavation and start examining our sample.

The first prominent feature we find states that:

It ain’t broken. Don’t fix it.

Here and below all quotes are from the RFC body.

And you know what? Back then it was my consideration too! All I really wanted was class and method keywords, private declarations… And that’s basically all. Perl with classes, what else?

Heh, young and stupid, aha…

The way I see it now? It’s still not broken. But neither it is good. And once you try it in Raku – there is no way back.

Perl’s current OO model has a number of well-known deficiencies: lack of (easy) encapsulation, poor support for hierarchical method calls (especially constructors and destructors), limited (single) dispatch mechanism, poor compile-time checking. More fundamentally, many people find that setting up reliable OO class hierarchies requires too much low-level coding.

Fairly long list, isn’t it? The good thing: these days there is Moose family of toolkits to get solutions for many of the listed problems. Not all of them though. But the fact of existence of these toolkits tells a lot about Perl’s flexibility which was really something outstanding back then, in the late 90s. Even though Moose didn’t exists yet in 2000, there were already a few OO toolkits implementing different approaches.

The bad thing: all of them are external solutions. Yes, CPAN. Yes, easy to install. Still…

This is one of the aspects where Raku shines absolutely: every single issue from the list above has been taken care of. And event some problems beyond the listed ones. Eventually, this is what made Raku almost totally backward incompatible to Perl. But do I feel pity about it? I’m certainly do not!

Later in RFC text Damian writes:

The non-prescriptive, non-proscriptive nature of Perl’s OO model makes it possible to construct am enormous range of OO systems within the one language

And you know what? You can do it in Raku too! Yes, you can create your own OO model from the scratch by utilizing Raku’s powerful meta-object protocol (MOP) capabilities. Apparently I won’t be discussing this matter in this article. But I can give you a hint:

say EXPORTHOW::<class>.^name; # Perl6::Metamodel::ClassHOW

Congratulations! We just found the class which is responsible for Raku class keyword. It is possible to implement your own keyword like, say, myclass (sorry for a commonplace) and make it possible to have declarations like:

myclass Foo { }

And make sure that the behavior of myclass kind of type objects is different from class. How much different is totally up to one’s imagination and demands.

Don’t like it? Create your own slang and extend Raku grammar with it! Why not to have something like:

myclass Foo;
   private Int attr1;
   public Str attr2;
end

It’s possible. The only remark to make about it: the power of Raku makes this kind of tricks unnecessary most of the time.

Yet, the ability to declare own class-like keyword which provides some kind of specialized functionality proves to be extremely useful for creating an ORM with an exceptional level of flexibility and ease of use!

A polite cough from the audience reminds me that it’s time to change subject. My apologies, the MOP is my all-time fad!

Of course, the RFC is not about critics but proposals. Let’s see why I think that this one is the source of many things we have in the Raku language today.

A private keyword that lexically scopes hash keys to the current package

As Perl’s OO wasn’t planned to for a drastic change in v5 to v6 transition, it was supposed to still remain based upon blessed hashes. This is not true for Raku. Why and how are questions not for an article, but rather for a small book. Anyway, the idea of private class members is here, even though not the keyword private itself:

class Foo {
    has $!private;
    has $.public;
    method !only-mine { }
    method for-anyone { self!only-mine }
}
my $foo = Foo.new;
$foo.for-anyone;
$foo.only-mine; # Error
say $foo.public;
say $foo.private; # Error

Instead of over-verbose public/private declarations Raku uses concise twigil notation using . for publics and ! for privates. But what’s even more interesting is that the only difference between the two declared attributes is that $.public automatically receives an accompanying method public which is the accessor for the attribute. Because, as a matter of fact, in Raku all attributes are actually private! By simplifying a bit, it can be said that the only thing which a class exposes into the world is its public methods.

A new special subroutine name — SETUP — to separate construction from initialization.

In fact, Raku’s object construction model is based upon three special methods: BUILDALL, BUILD, and TWEAK. The latter is the evolution of SETUP method idea.

But! (There is often a but in Raku) The three are not the kind of methods we used to think about them. They are submethods which are the sole property of a class or a role which declares them. What it means in practice is:

class Foo {
    submethod foo { say "foo!" } 
    method bar { say "bar!" }
}
class Bar is Foo { }
Foo.foo; # foo!
Bar.bar; # bar!
Bar.foo; # Error: no such method

Submethods are a kind of tool ideally suited for performing tasks totally specific to the class/role. Precisely like the construction/destruction tasks.

  • Changes to the semantics of bless so that, after associating an object with a class, the class’s SETUP methods are automatically called on the object. An additional trailing @ parameter for bless, to allow arguments to be passed to SETUP methods.
  • Changes to the semantics of DESTROY, so that all inherited destructors are, by default, automatically called when an object is destroyed.

Yes and no. Back then, when the RFC was written, the low-level architecture of Perl6 not even started to be discussed. Thus many ideas was based on Perl5 design in which the core is written in C and bundled with a bunch of modules. Correspondingly, bless is the core thing doing some kind of magic to the data provided as an argument. It seemed natural to teach bless a few additional tricks to get the desired outcome.

In Raku everything is different in this area. First of all, the language specification doesn’t imply the exact way of how construction/destruction stages are to be implemented, it only demands the constructor/destructor methods to be supported and invoked in specific order with specific parameters. So, the “Yes” in the beginning of the previous paragraph is related to the fact that the automatic invocation does take place and the arguments are passed from a call to the method new to the construction submethods:

class Foo {
    has $.foo;
    method TWEAK(:$foo) {
        $!foo = $foo * 10 if $foo < 10;
    }
}
say Foo.new(foo => 5).foo; # 50

But the “No” above is related to the fact that there is no low-level bless subroutine responsible for how the things are done. For example, the way Rakudo compiler implements the specification of object construction is basically winds down to something like this incomplete pseudo-code:

method new(*%attrinit) {
    self.bless(|%attrinit)
}
method bless(*%attrinit) {
    nqp::create(self).BUILDALL(Empty, %attrinit)
}

So bless is no more than just an ordinary pre-defined method. If necessary, you can override it in your class:

class Foo {
    has $.foo;
    method bless(:$foo, *%c) {
        nextwith(|%c, foo => $foo * 2) 
    } 
}
say Foo.new(foo => 4).foo; # 8

The code will work because this part of the construction logic is partially implemented by class Mu from which all Raku classes indirectly inherit by default. So, if no special care is taken, when you do Foo.new it means the method new from Mu is invoked.

Besides, in Rakudo’s scenario all the “magic” of object initialization eventually happens within the Mu::BUILDALL method which is using the MOP to determine all the steps to be done for you to get an instance of Foo properly setup and ready for use.

Off all the above steps it’s only the nqp::create call which is served by the low-level core executable (virtual or bytecode machine in Rakudo terms) and which purpose is to allocate and initialize memory for an object representation.

Pre- and post-condition specifiers, which associate code blocks with particular subroutine/method names.

This idea has never got developed. Instead Raku has gotten something way more powerful! A concept which applies not only to routines but to any object. And don’t forget: in Raku everything is an object! The concept I’m talking about is trait.

A trait is a routine which gets invoked at code compilation time and gets the object it is applied to as its argument. It’s also possible to pass your own arguments to the trait, which is able to use the full power of MOP to setup or even alter the object the way the user needs it to. For example:

class Foo {
    has $.foo is rw is default(42);
}

In this snippet is default(42) is a simple example of passing an argument to a trait. The meaning is to specify the value the attribute will get initially and every time it gets assigned with Nil.

is rw makes the attribute writable because, by default, all attributes in Raku are read-only. Remember I wrote earlier that everything is an object? Attributes are no exception! RO or RW status of an attribute is determined by… well, by an attribute value on an Attribute object! Thus, our is rw trait is actually as simple as:

multi sub trait_mod:<is>(Attribute:D $attr, :rw($)!) {
    $attr.set_rw();
    warn "useless use of 'is rw' on $attr.name()" unless $attr.has_accessor;
}

Just ignore all the syntax used here and consider the use of set_rw() method. That’s basically all is needed to make an attribute writable.

If we now get back to the pre- and post-condition specifiers mentioned in the RFC, they also can be implemented with traits using method wrap of Method objects.

I would now skip a few following items in the RFC list we’re walking over now. Some were not implemented, some are just self-evident. Multi-dispatch itself worth an article and I hope somebody would pick it up for this advent calendar.

Let’s just fast-forward directly to the last one:

A new pragma — delegation — that would modify the dispatch mechanism to automatically delegate specific method calls to specified attributes of an object.

This just is another example where traits came to the rescue. There is no delegation in the Raku, but there is a trait named handles (already mentioned in a previous article):

class Book {
    has Str  $.title;
    has Str  $.author;
    has Str  $.language;
    has Cool $.publication;
}
 
class Product {
    has Book $.book handles('title', 'author', 'language', year => 'publication');
}

I chose it since it’s another example of a very elegant solution where no core intervention is needed to implemented some advanced functionality. In two words, the only thing handles does – it installs new methods on the class to which its attribute belongs. Of course, it takes into account some edge cases, tries to optimize things where possible. But, otherwise, there is no magic in it. There is no thing which you wouldn’t be able to do yourself!

This is what I’d like to conclude this article with. Years ago the word magic was kind of trendy among Perl developers. “Here we do some magic” – and then something really unexpected was happening. It was a lot of fun!

However, recently I found an interesting definition of what magic is: a kind of action a person performs to achieve a result which doesn’t logically follow from the action itself.

Sorry for perhaps clumsy translation, but I hope it reflects the idea behind the definition. And makes the good point of the magic while being fun not being good for the production.

In Raku the magic is eliminated. Instead, Raku brought in such a level of uniformity among different levels of code that often the first impression of: wow, this is magical! – is soon replaced with: wow, it’s so logical!

But you know what? When you put everything together and look at the language as a whole, it creates even bigger magic which can easily enchant you once and forever.

 

RFC 112 by Richard Proctor: Assignment within a regex

Richard wanted to

Provide a simple way of naming and picking out information from a regex without having to count the brackets.

I can say without hesitation that Raku (and before its rename, Perl 6) has achieved this goal — but all the details are different than proposed.

The reason is two-fold.

For one, Richard assumed a pretty straight-forward extension to Perl 5’s regex syntax, and his proposed syntax, (?$hours=..) for a named capture made sense. Instead, Raku regex syntax is pretty much a new thing, where all non-alphanumeric characters are potentially meta characters, and thus either used or reserved for one purpose or another. This made easier syntax than (?$name=regex) available for named captures.

The second is even more profound: The Raku designers realized that regexes could only be truly powerful if reuse was built in from the ground up. And the best way to make that happen was to make them first class.

I want to dwell on that point a bit: consider the power of functions (and closures) as first-class citizens in modern programming languages. Lisp has shown us what you can do with them, and now basically every programming language has got them. Dynamic languages like Perl, Ruby, Javascript and Python were pretty early adopters, modern statically typed languages like C# and F# also got them; even Java caught up eventually. Java didn’t even have functions, just methods, and now it’s got closures that you can pass around.

In my humble opinion, raising regexes to the level of first-class citizens and introducing a concise call syntax gave regexes a similar boost.

In the old days, it was common wisdom that you cannot parse XML (or other arbitrarily nested languages) with regexes, because they are not a regular language in the computer science sense. Perl 5 has some workarounds for that, but they are so clunky and verbose that I haven’t even seen them recommended much, and my general impression is that if you use them, it’s just for the lack of good alternatives.

Not so in Raku: <subrule> in a regex calls another regex called subrule, and so you have recursion (and, relevant to the discussion of RFC 112, named captures). This recursion moves regexes from regular into context-free language territory in the Chomsky Hierarchy. But more than recursion, the named regexes allow much easier reuse, testing in isolation and all that other wonderful stuff that first-classiness gave to functions. It also moved the sentiment towards parsing XML and other languages with regexes from “are you serious?” to “sure, it’s the best tool”.

The call syntax <subrule> implies a named capture, and it turns out that’s convenient enough that explicit named captures (not tied to a call) are actually pretty rare in real-world parsers. An explicit syntax for that exists though, it’s $<capturename>=[...].

Which brings us to the second interesting bit: RFC 112 doesn’t just talk about named captures, but implies that they are directly stored into variables of the same name.

This is problematic for a variety of reasons:

  • Scoping. A regex can be declared and used at two very different parts of the program. Forcing the variable to be in scope would make the capture syntax a source variables with an unnecessary large scope (dare I say global?), which is a clear anti pattern.
  • Quantifiers. In RFC 211 syntax, what would have happened with a regex like (?$char=.)+ matching the string abc? What’s in $char? The sigil implies a scalar, so… maybe the last capture, c? And throw away all the other matches? Doesn’t sound too good. Or maybe (?@char=.) would have an array with all captures, but then, when writing a regex, you’d have to know if anybody later wants to use that inside a quantifier. Not stellar either.
  • Composition. Binding matches to a variable assumes the regex is used as a top-level construct, and not part of larger thing.
  • Recursion. Do I even need to elaborate? Probably not.

The solution that’s implemented in Raku now is more suited to world of first-class regexes: for each regex match there’s a Match object. The top-level match object is stored in the variable $/, so accessing a named capture key is $/<key>, and there’s even a short-hand for that, $<key>. Just two character longer than the originally proposed $key.

This solution shines though when used in the context of composition, for example. Since the named capture corresponds to a regex match, it’s also a Match object, and so we arrive at a tree of matches (all alike). Or rephrased: a regex match already is a syntax tree.


I think this RFC is a good example how a real pain point and problem was identified, and a solution proposed. Aspects of this solution have survived the language design process, but most details haven’t, because the language changed much more than it barely being Perl 5 plus a few extensions through RFCs.

I have been participating in the Perl 6 project since around the year 2007, and have watched some of these transformations; for regexes, the majority of the consolidation and redesign work had already been done. I watched the implementations become more powerful, and even helped a little here and there.

Living in this process was a magical experience, just as magical as the result is now.

RFC1: Threads

RFC 1, by Bryan C. Warnock: Threads

It might or might not be the case that the need for a real multithreaded architecture in Perl was the real motive behind the creation of what was initially called simply Perl, then Perl 6, and eventually Raku.

It was probably late 90s or early 10s, when we had a contract with a big company that needed to download stuff from the web really fast. We needed those threads, and they finally arrived in Perl 5.8.8. However, our threads were very basic, didn’t need any kind of communication, just the bare parallel thing, and underneath them, operating system processes were used; there were no real threads at the Perl VM level. And they were sorely needed. Which is why RFC1 read:

Implementation of Threads in Perl

It was originally proposed on August 1st (hence the 20th aniversary thing), and finally frozen a couple of month later, by September 28th.

It basically proposes a way to implement low-level threads, including new namespaces (global, for sharing variables among threads) as well as the Threads class, with this example:

use Threads;
# the main thread has all four above in its arena

my $thread2 = Threads->new(\&start_thread2);
#...

sub start_thread2 { ... }

The main thread is implicit, and gets all other modules into its namespace, the second one inherits from the main thread. It makes sense, in general, except it’s a very low level mechanism to use threads, and in fact it looks more like a way to handle processes than what we call nowadays threads. There’s another RFC for those, which are called “lightweight threads”, which was started a few week later and frozen pretty much at the same time. It contains the graphic simile:

Perl → Swiss-army chain saw; Perl with threads → juggling chain saws

It’s difficult to see what’s the difference between them, except for the explicit sharing of variables and the fact that it uses Thread instead of Threads as the main class.

Eventually, that was the keyword chosen for threads in Raku: Thread. This uses new to create a thread, but you have then to issue a .run to actually run it. Alternatively, you can simply use .start to create and run a thread inmediately.

#!/usr/bin/env raku
constant $interval = 100000;
my @threads = (^10).map: -> $i {
Thread.start(
name => "Checking primes from {$i * $interval } to { ($i+1)*$interval}",
sub {
for ($i * $interval)..^(($i+1)*$interval) -> $n {
next if ( $n %% 2 ) | ( $n %% 3 ) | ($n %% 5 );
say "Prime $n found in $*THREAD" if $n.is-prime;
}
},
);
}
.finish for @threads;

This is taken pretty much directly from the example in the Thread manual page, and shows the differences between Raku and what its early inceptions looked like. It uses a map to start 10 threads (using a Range); every thread will work on a range of numbers to check if there’s a prime in then. After cribbing out a few easy ones, it will simply check, using the is-prime function, if the number is prime, and it will print the number and the thread it’s in. The $*THREAD variable allows for easy introspection of the thread one is in, which will make this print something like this:

...
Prime 76579 found in Thread<4>(Checking primes from 0 to 100000)
Prime 994997 found in Thread<13>(Checking primes from 900000 to 1000000)
Prime 655043 found in Thread<10>(Checking primes from 600000 to 700000)
Prime 483991 found in Thread<8>(Checking primes from 400000 to 500000)
Prime 169283 found in Thread<5>(Checking primes from 100000 to 200000)
Prime 995009 found in Thread<13>(Checking primes from 900000 to 1000000)
Prime 761533 found in Thread<11>(Checking primes from 700000 to 800000)
...

Every thread has, by design, specialized in a specific range; thread number 13 gets from 900K to 1000K, for instance. Working with threads is much more efficient, but a process needs to be pinned to a specific thread to do this. This is why low-level thread access is not really the best way to create a concurrent program. Working with higher-level APIs makes a lot of more sense.

However, in 2000 it was enough to have the insight that a thread engine was needed for a modern, 100-year language like Raku. And Bryan C. Warnock, who became famous because of the Warnock’s Dilemma, had, if not the insight of the original idea, at least the laziness, impatience and hubris of putting it down in what eventually became the first RFC for Raku, 20 years ago today.

The origin of Warnock’s dilemma, according to Wikipedia, is pretty much in the same month, and actually originated in the bootstrap (for perl6) mailing list. And it is totally related to the fact that the response to that RFC was underwhelming, which indicates that either no one cared, or it was just perfect. I tend to think the latter, so thanks, Bryan, for this.

Threads on the world

20 years ago tomorrow

20 years ago, on the first of August, the inception of a language started to, well, incept.

Actually, it started a bit earlier than that. Perl was in need of change, so it was decided that the community itself should propose what the language needed to do to go forward one step, from Perl 5 to Perl 6. A call for requests for change was made; every one should include possible changes to Perl, as well as, if possible, an implementation proposal, laying out how to proceed. The procedure didn’t lack criticism, but it can’t be said that it was not received, in general, with such an enthusiasm that August 1st already saw the first RFC, pretty much at the same time as some instructions from Larry Wall on how to actually proceed.

The rest is history. It looks a bit like sacred history, since those RFCs were picked up (and apart) by Larry Wall’s apocalypses, explained later by Damian Conway’s exegeses, and roasted in the synopsis, which eventually became the roast repository, the actual specification of the language.

Which is now called Raku. But that’s another story.

To celebrate this part of the history and the people that brought us where we are now, starting tomorrow, we’ll publish 20 articles, one a day, that will focus on one or a few RFCs and show what they eventually became in today’s Raku. So come back every day for a piece of Raku, of history, and of Raku history!