Day 22: He’s making a list… (part 1)

If there’s anything that Santa and his elves ought to know, it’s how to make a list. After all, they’re reading lists that children send in, and Santa maintains his very famous list. Another thing we know is that Santa and his elves are quite multilingual.

So one day one of the elfs decided that, rather than hand typing out a list of gifts based on the data they received (requiring elves that spoke all the world’s languages), they’d take advantage of the power of Unicode’s CLDR (Common Linguistic Data Repository). This is Unicode’s lesser-known project. As luck would have it, Raku has a module providing access to the data, called Intl::CLDR. One elf decided that he could probably use some of the data in it to automate their list formatting.

He began by installing Intl::CLDR and played around with it in the terminal. The module was designed to allow some degree of exploration in a REPL, so the elf did the following after reading the provided read me:














# Repl response















use Intl::CLDR;          # Nil















my $english = cldr<en>   # [CLDR::Language: characters,context-transforms,















                         #  dates,delimiters,grammar,layout,list-patterns,















                         #  locale-display-names,numbers,posix,units]

The module loaded up the data for English and the object returned had a neat gist that provides information about the elements it contains. For a variety of reasons, Intl::CLDR objects can be referenced either as attributes or as keys. Most of the time, the attribute reference is faster in performance, but the key reference is more flexible (because let’s be honest, $english{$foo} looks nicer than $english."$foo"(), and it also enables listy assignment via e.g. $english<grammar numbers>).

In any case, the elf saw that one of the data points is list-patterns, so he explored further:














# Repl response















$english.list-patterns;                 # [CLDR::ListPatterns: and,or,unit]















$english.list-patterns.and;             # [CLDR::ListPattern: narrow,short,standard]















$english.list-patterns.standard;        # [CLDR::ListPatternWidth: end,middle,start,two]















$english.list-patterns.standard.start;  # {0}, {1}















$english.list-patterns.standard.middle; # {0}, {1}















$english.list-patterns.standard.end;    # {0}, and {1}















$english.list-patterns.standard.two;    # {0} and {1}

Aha! He found the data he needed.

List patterns are catalogued by their function (and-ing them, or-ing them, and a unit one designed for formatting conjoined units such as 2ft 1in or similar). Each pattern has three different lengths. Standard is what one would use most of the time, but if space is a concern, some languages might allow for even slimmer formatting. Lastly, each of those widths has four forms. The two form combines, well, two elements. The other three are used to collectively join three or more: start combines the first and second element, end combines the penultimate and final element, and middle combines all second to penultimate elements.

He then wondered what this might look like for other languages. Thankfully, testing this out in the repl was easy enough:














my &and-pattern = { cldr{$^language}.list-patterns-standard<start middle end two>.join: "\t"'" }















                  # Repl response (RTL corrected, s/\t/' '+/)















and-pattern 'es'  # {0}, {1}    {0}, {1}    {0} y {1}    {0} y {1}















and-pattern 'ar'  # ‮{0} و{1}     {0} و{1}    {0} و{1}    {0} و{1}















and-pattern 'ko'  # {0}, {1}    {0}, {1}    {0} 및 {1}    {0} 및 {1}















and-pattern 'my'  # {0} - {1}   {0} - {1}   {0}နှင့် {1}    {0}နှင့် {1}















and-pattern 'th'  # {0} {1}     {0} {1}     {0} และ{1}   {0}และ{1}

He quickly saw that there was quite a bit of variation! Thank goodness someone else had already catalogued all of this for him. So he went about trying to create a simple formatting routine. To begin, he created a very detailed signature and then imported the modules he’d need.














#| Lengths for list format.  Valid values are 'standard', 'short', and 'narrow'.















subset ListFormatLength of Str where <standard short narrow>;
































#| Lengths for list format.  Valid values are 'and', 'or', and 'unit'.















subset ListFormatType of Str where <standard short narrow>;
































use User::Language;     # obtains default languages for a system















use Intl::LanguageTag;  # use standardized language tags















use Intl::CLDR;         # accesses international data
































#| Formats a list of items in an internationally-aware manner















sub format-list(















                     +@items,                   #= The items to be formatted into a list















    LanguageTag()    :$language = user-language #= The language to use for formatting















    ListFormatLength :$length   = 'standard',   #= The formatting width















    ListFormatType   :$type     = 'and'         #= The type of list to create















) {















    ...















    ...















    ...















}

That’s a bit of a big bite, but it’s worth taking a look at. First, the elf opted to use declarator POD wherever it’s possible. This can really help out people who might want to use his eventual module in an IDE, for autogenerating documentation, or for curious users in the REPL. (If you type in ListFormatLength.WHY, the text “Lengths for list format … and ‘narrow’” will be returned.) For those unaware of declarator POD, you can use either #| to apply a comment to the following symbol declaration (in the example, for the subset and the sub itself), or #= to apply it to the preceeding symbol declaration (most common with attributes).

Next, he imported two modules that will be useful. User::Language detects the system language, and he used it to provide sane defaults. Intl::LanguageTag is one of the most fundamental modules in the international ecosystem. While he wouldn’t strictly need it (we’ll see he’ll ultimately only use them in string-like form), it helps to ensure at least a plausible language tag is passed.

If you’re wondering what the +@items means, it applies a DWIM logic to the positional arguments. If one does format-list @foo, presumably the list is @foo, and so @items will be set to @foo. On the other hand, if someone does format-list $foo, $bar, $xyz, presumably the list isn’t $foo, but all three items. Since the first item isn’t a Positional, Raku assumes that $foo is just the first item and the remaining positional arguments are the rest of the items. The extra () in LanguageTag() means that it will take either a LanguageTag or anything that can be coerced into one (like a string).

Okay, so with that housekeeping stuff out of the way, he got to coding the actual formatting, which is devilishly simple:














my $format = cldr{$language}.list-format{$type}{$length};















    my ($start, $middle, $end, $two) = $format<start middle end two>;















































    if    @items  > 2 { ...                          }















    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }















    elsif @items == 1 { @items.head                  }















    else              { ''                           }

He paused here to check and see if stuff would work. So he ran his script and added in the following tests:














# output















format-list <>,    :language<en>; # '' 















format-list <a>,   :language<en>; # 'a'















format-list <a b>, :language<en>; # 'a{0} and {1}b'

While the simplest two cases were easy, the first one to use CLDR data didn’t work quite as expected. The elf realized he’d need to actually replace the {0} and {1} with the item. While technically he should use subst or similar, after going through the CLDR, he realized that all of them begin with {0} and end with {1}. So he cheated and changed the initial assignment line to














my $format = cldr{$language}.list-format{$type}{$length};















    my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3, *-3);

Now he his two-item function worked well. For the three-or-more condition though, he had to think a bit harder how to combine things. There are actually quite a few different ways to do it! The simplest way for him was to take the first item, then the $start combining text, then join the second through penutimate, and then finish off with the $end and final item:














if @items > 2 {















        ~ $items[0]















        ~ $start















        ~ $items[1..*-2].join($middle)















        ~ $end















        ~ $items[*-1]















    }















    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }















    elsif @items == 1 { @items.head                  }















    else              { ''                           }

Et voilà! His formatting function was ready for prime-time!














# output















format-list <>,        :language<en>; # '' 















format-list <a>,       :language<en>; # 'a'















format-list <a b>,     :language<en>; # 'a and b'















format-list <a b c>,   :language<en>; # 'a, b, and c'















format-list <a b c d>, :language<en>; # 'a, b, c, and d'

Perfect! Except for one small problem. When they actually started using this, the computer systems melted some of the snow away because it overheated. Every single time they called the function, the CLDR database needed to be queried and the strings would need to be clipped. The elf had to come up with something to be a slight bit more efficient.

He searched high and wide for a solution, and eventually found himself in the dangerous lands of Here Be Dragons™, otherwise known in Raku as EVAL. He knew that EVAL could potentially be dangerous, but that for his purposes, he could avoid those pitfalls. What he would do is query CLDR just once, and then produce a compilable code block that would do the simple logic based on the number of items in the list. The string values could probably be hard coded, sparing some variable look ups too.

There be dragons here 🐉🦋

EVAL should be used with great caution. All it takes is one errant unescaped string being accepted from an unknown source and your system could be taken. This is why it requires you to affirmatively type use MONKEY-SEE-NO-EVAL in a scope that needs EVAL. However, in situations like this, where we control all inputs going in, things are much safer. In tomorrow’s article, we’ll discuss ways to do this in an even more safer manner, although it adds a small degree of complexity.

Back to the regularly scheduled program

To begin, the elf imagined his formatting function.














sub format-list(+@items) {















    if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }















    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }















    elsif @items == 1 { @items[0] }















    else              { '' }















}

That was … really simple! But he needed this in a string format. One way to do that would be to just use straight string interpolation, but he decided to use Raku’s equivalent of a heredoc, q:to. For those unfamiliar, in Raku, quotation marks are actually just a form of syntactic sugar to enter into the Q (for quoting) sublanguage. Using quotation marks, you only get a few options: ' ' means no escaping except for \\, and using " " means interpolating blocks and $-sigiled variables. If we manually enter the Q-language (using q or Q), we get a LOT more options. If you’re more interested in those, you can check out Elizabeth Mattijsen’s 2014 Advent Calendar post on the topic. Our little elf decided to use the q:s:to option to enable him to keep his code as is, with the exception of having scalar variables interpolated. (The rest of his code only used positional variables, so he didn’t need to escape!)














my $format = cldr{$language}.list-format{$type}{$length};















my ($start, $middle, $end, $two) = $format<start middle end two>;
































my $code = q:s:to/FORMATCODE/;















    sub format-list(+@items) {















        if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }















        elsif @items == 2 { @items[0] ~ $two ~ @items[1] }















        elsif @items == 1 { @items[0] }















        else              { '' }















    }















    FORMATCODE















EVAL $code;

The only small catch is that he’d need to get a slightly different version of the text from CLDR. If the text and were placed verbatim where $two is, that block would end up being @items[0] ~ and ~ @items[1] which would cause a compile error. Luckily, Raku has a command here to help out! By using the .raku function, we get a Raku code form for most any object. For instance:














# REPL output















'abc'.raku    # "abc"















"abc".raku    # "abc"















<a b c>.raku  # ("a", "b", "c")

So he just changed his initial assignment line to chain one more method (.raku):














my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;

Now his code worked. His last step was to find a way to reuse it to benefit from this initial extra work.He made a very rudimentary caching set up (rudimentary because it’s not theoretically threadsafe, but even in this case, since values are only added, and will be identically produced, there’s not a huge problem). This is what he came up with (declarator pod and type information removed):














sub format-list (+@items, :$language 'en', :$type = 'and', :$length = 'standard') {















    state %formatters;















    my $code = "$language/$type/$length";































    # Get a formatter, generating it if it's not been requested before















    my &formatter  = %cache{$code}















                  // %cache{$code} = generate-list-formatter($language, $type, $length);































    formatter @items;















}
































sub generate-list-formatter($language, $type, $length --> Sub ) {















    # Get CLDR information















    my $format = cldr{$language}.list-format{$type}{$length};















    my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;































    # Generate code















    my $code = q:s:to/FORMATCODE/;















        sub format-list(+@items) {















            if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }















            elsif @items == 2 { @items[0] ~ $two ~ @items[1] }















            elsif @items == 1 { @items[0] }















            else              { '' }















        }















        FORMATCODE































    # compile and return















    use MONKEY-SEE-NO-EVAL;















    EVAL $code;















}

And there he was! His function was all finished. He wrapped it up into a module and sent it off to the other elves for testing:














format-list <apples bananas kiwis>, :language<en>;      # apples, bananas, and kiwis















format-list <apples bananas>, :language<en>, :type<or>; # apples or bananas















format-list <manzanas plátanos>, :language<es>;         # manzanas y plátanos















format-list <انارها زردآلو تاریخ>, :language<fa>;       # انارها، زردآلو، و تاریخ

Hooray!

Shortly thereafter, though, another elf took up his work and decided to go even crazier! Stay tuned for more of the antics from Santa’s elves how they took his lists to another level.

Day 22: He’s making a list… (part 1)

There be dragons here 🐉🦋

Back to the regularly scheduled program

3 thoughts on “Day 22: He’s making a list… (part 1)”

Leave a comment Cancel reply

	# Repl response
	use Intl::CLDR; # Nil
	my $english = cldr<en> # [CLDR::Language: characters,context-transforms,
	# dates,delimiters,grammar,layout,list-patterns,
	# locale-display-names,numbers,posix,units]

	# Repl response
	$english.list-patterns; # [CLDR::ListPatterns: and,or,unit]
	$english.list-patterns.and; # [CLDR::ListPattern: narrow,short,standard]
	$english.list-patterns.standard; # [CLDR::ListPatternWidth: end,middle,start,two]
	$english.list-patterns.standard.start; # {0}, {1}
	$english.list-patterns.standard.middle; # {0}, {1}
	$english.list-patterns.standard.end; # {0}, and {1}
	$english.list-patterns.standard.two; # {0} and {1}

	my &and-pattern = { cldr{$^language}.list-patterns-standard<start middle end two>.join: "\t"'" }
	# Repl response (RTL corrected, s/\t/' '+/)
	and-pattern 'es' # {0}, {1} {0}, {1} {0} y {1} {0} y {1}
	and-pattern 'ar' # ‮{0} و{1} {0} و{1} {0} و{1} {0} و{1}
	and-pattern 'ko' # {0}, {1} {0}, {1} {0} 및 {1} {0} 및 {1}
	and-pattern 'my' # {0} - {1} {0} - {1} {0}နှင့် {1} {0}နှင့် {1}
	and-pattern 'th' # {0} {1} {0} {1} {0} และ{1} {0}และ{1}

	#\| Lengths for list format. Valid values are 'standard', 'short', and 'narrow'.
	subset ListFormatLength of Str where <standard short narrow>;

	#\| Lengths for list format. Valid values are 'and', 'or', and 'unit'.
	subset ListFormatType of Str where <standard short narrow>;

	use User::Language; # obtains default languages for a system
	use Intl::LanguageTag; # use standardized language tags
	use Intl::CLDR; # accesses international data

	#\| Formats a list of items in an internationally-aware manner
	sub format-list(
	+@items, #= The items to be formatted into a list
	LanguageTag() :$language = user-language #= The language to use for formatting
	ListFormatLength :$length = 'standard', #= The formatting width
	ListFormatType :$type = 'and' #= The type of list to create
	) {
	...
	...
	...
	}

	my $format = cldr{$language}.list-format{$type}{$length};
	my ($start, $middle, $end, $two) = $format<start middle end two>;


	if @items > 2 { ... }
	elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
	elsif @items == 1 { @items.head }
	else { '' }

	# output
	format-list <>, :language<en>; # ''
	format-list <a>, :language<en>; # 'a'
	format-list <a b>, :language<en>; # 'a{0} and {1}b'

	if @items > 2 {
	~ $items[0]
	~ $start
	~ $items[1..*-2].join($middle)
	~ $end
	~ $items[*-1]
	}
	elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
	elsif @items == 1 { @items.head }
	else { '' }

	sub format-list(+@items) {
	if @items > 2 { @items[0] ~ $start ~ @items[1..-2].join($middle) ~ $end ~ @items[-1] }
	elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
	elsif @items == 1 { @items[0] }
	else { '' }
	}

	my $format = cldr{$language}.list-format{$type}{$length};
	my ($start, $middle, $end, $two) = $format<start middle end two>;

	my $code = q:s:to/FORMATCODE/;
	sub format-list(+@items) {
	if @items > 2 { @items[0] ~ $start ~ @items[1..-2].join($middle) ~ $end ~ @items[-1] }
	elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
	elsif @items == 1 { @items[0] }
	else { '' }
	}
	FORMATCODE
	EVAL $code;

	# REPL output
	'abc'.raku # "abc"
	"abc".raku # "abc"
	<a b c>.raku # ("a", "b", "c")

	sub format-list (+@items, :$language 'en', :$type = 'and', :$length = 'standard') {
	state %formatters;
	my $code = "$language/$type/$length";

	# Get a formatter, generating it if it's not been requested before
	my &formatter = %cache{$code}
	// %cache{$code} = generate-list-formatter($language, $type, $length);

	formatter @items;
	}

	sub generate-list-formatter($language, $type, $length --> Sub ) {
	# Get CLDR information
	my $format = cldr{$language}.list-format{$type}{$length};
	my ($start, $middle, $end, $two) = $format<start middle end two>.map: .substr(3,-3).raku;

	# Generate code
	my $code = q:s:to/FORMATCODE/;
	sub format-list(+@items) {
	if @items > 2 { @items[0] ~ $start ~ @items[1..-2].join($middle) ~ $end ~ @items[-1] }
	elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
	elsif @items == 1 { @items[0] }
	else { '' }
	}
	FORMATCODE

	# compile and return
	use MONKEY-SEE-NO-EVAL;
	EVAL $code;
	}

	format-list <apples bananas kiwis>, :language<en>; # apples, bananas, and kiwis
	format-list <apples bananas>, :language<en>, :type<or>; # apples or bananas
	format-list <manzanas plátanos>, :language<es>; # manzanas y plátanos
	format-list <انارها زردآلو تاریخ>, :language<fa>; # انارها، زردآلو، و تاریخ

There be dragons here 🐉🦋

Back to the regularly scheduled program

Share this:

Related

3 thoughts on “Day 22: He’s making a list… (part 1)”

Leave a comment Cancel reply