Day 9 – Monadic programming examples

Introduction

This document (notebook) provides examples of monadic pipelines for computational workflows in Raku. It expands on the blog post “Monad laws in Raku”, [AA2], (notebook), by including practical, real-life examples.

Context

As mentioned in [AA2], here is a list of the applications of monadic programming we consider:

  1. Graceful failure handling
  2. Rapid specification of computational workflows
  3. Algebraic structure of written code

Remark: Those applications are discussed in [AAv5] (and its future Raku version.)

As a tools maker for Data Science (DS) and Machine Learning (ML), [AA3],
I am very interested in Point 1; but as a “simple data scientist” I am mostly interested in Point 2.

That said, a large part of my Raku programming has been dedicated to rapid and reliable code generation for DS and ML by leveraging the algebraic structure of corresponding software monads, i.e. Point 3. (See [AAv2, AAv3, AAv4].) For me, first and foremost, monadic programming pipelines are just convenient interfaces to computational workflows. Often I make software packages that allow “easy”, linear workflows that can have very involved computational steps and multiple tuning options.

Dictionary

  • Monadic programming
    A method for organizing computations as a series of steps, where each step generates a value along with additional information about the computation, such as possible failures, non-determinism, or side effects. See [Wk1].
  • Monadic pipeline
    Chaining of operations with a certain syntax. Monad laws apply loosely (or strongly) to that chaining.
  • Uniform Function Call Syntax (UFCS)
    A feature that allows both free functions and member functions to be called using the same object.function() method call syntax.
  • Method-like call
    Same as UFCS. A Raku example: [3, 4, 5].&f1.$f2.

Setup

Here are loaded packages used in this document (notebook):

use Data::Reshapers;
use Data::TypeSystem;
use Data::Translators;

use DSL::Translators;
use DSL::Examples;

use ML::SparseMatrixRecommender;
use ML::TriesWithFrequencies;

use Hilite::Simple;


Prefix trees

Here is a list of steps:

  • Make a prefix tree (trie) with frequencies by splitting words into characters over @words2
  • Merge the trie with another trie made over @words3
  • Convert the node frequencies into probabilities
  • Shrink the trie (i.e. find the “prefixes”)
  • Show the tree-form of the trie

Let us make a small trie of pet names (used by Raku or Perl fans):

my @words1 = random-pet-name(*)».lc.grep(/ ^ perl /);
my @words2 = random-pet-name(*)».lc.grep(/ ^ [ra [k|c] | camel ] /);

Here we make a trie (prefix tree) for those pet names using the feed operator and the functions of “ML::TriesWithFrequencies”, [AAp5]:

@words1 ==> 
trie-create-by-split==>
trie-merge(@words2.&trie-create-by-split) ==>
trie-node-probabilities==>
trie-shrink==>
trie-say

TRIEROOT => 1
├─camel => 0.10526315789473684
│ ├─ia => 0.5
│ └─o => 0.5
├─perl => 0.2631578947368421
│ ├─a => 0.2
│ ├─e => 0.2
│ └─ita => 0.2
└─ra => 0.631578947368421
  ├─c => 0.75
  │ ├─er => 0.2222222222222222
  │ ├─he => 0.5555555555555556
  │ │ ├─al => 0.2
  │ │ └─l => 0.8
  │ │   └─  => 0.5
  │ │     ├─(ray ray) => 0.5
  │ │     └─ray => 0.5
  │ ├─ie => 0.1111111111111111
  │ └─ket => 0.1111111111111111
  └─k => 0.25
    ├─i => 0.3333333333333333
    └─sha => 0.6666666666666666

Using andthen and the Trie class methods (but skipping node-probabilities calculation in order to see the counts):

@words1
andthen .&trie-create-by-split
andthen .merge( @words2.&trie-create-by-split )
# andthen .node-probabilities
andthen .shrink
andthen .form

TRIEROOT => 19
├─camel => 2
│ ├─ia => 1
│ └─o => 1
├─perl => 5
│ ├─a => 1
│ ├─e => 1
│ └─ita => 1
└─ra => 12
  ├─c => 9
  │ ├─er => 2
  │ ├─he => 5
  │ │ ├─al => 1
  │ │ └─l => 4
  │ │   └─  => 2
  │ │     ├─(ray ray) => 1
  │ │     └─ray => 1
  │ ├─ie => 1
  │ └─ket => 1
  └─k => 3
    ├─i => 1
    └─sha => 2


Data wrangling

One appealing way to show that monadic pipelines result in clean and readable code, is to demonstrate their use in Raku through data wrangling operations. (See the “data packages” loaded above.) Here we get the Titanic dataset, show its structure, and show a sample of its rows:

#% html
my @dsTitanic = get-titanic-dataset();
my @field-names = <id passengerClass passengerSex passengerAge passengerSurvival>;

say deduce-type(@dsTitanic);

@dsTitanic.pick(6) 
==> to-html(:@field-names)

Vector(Assoc(Atom((Str)), Atom((Str)), 5), 1309)

idpassengerClasspassengerSexpassengerAgepassengerSurvival
9603rdmale30died
1831stfemale30survived
10433rdfemale-1survived
1651stmale40survived
8913rdmale20died
8063rdmale-1survived

Here is an andthen data wrangling monadic pipeline, the lines of which have the following interpretations:

  • Initial pipeline value (the dataset)
  • Rename columns
  • Filter rows (with age greater or equal to 10)
  • Group by the values of the columns “sex” and “survival”
  • Show the structure of the pipeline value
  • Give the sizes of each group as a result
@dsTitanic 
andthen rename-columns($_,  {passengerAge => 'age', passengerSex => 'sex', passengerSurvival => 'survival'})
andthen $_.grep(*<age> ≥ 10).List
andthen group-by($_, <sex survival>)
andthen {say "Dataset type: ", deduce-type($_); $_}($_)
andthen $_».elems

Dataset type: Struct([female.died, female.survived, male.died, male.survived], [Array, Array, Array, Array])


{female.died => 88, female.survived => 272, male.died => 512, male.survived => 118}

Remark: The andthen pipeline corresponds to the R pipeline in the next section.

Similar result can be obtained via cross-tabulation and using a pipeline with the feed (==>) operator:

@dsTitanic
==> { .grep(*<passengerAge> ≥ 10) }()
==> { cross-tabulate($_, 'passengerSex', 'passengerSurvival') }()
==> to-pretty-table()

+--------+----------+------+
|        | survived | died |
+--------+----------+------+
| female |   272    |  88  |
| male   |   118    | 512  |
+--------+----------+------+

Tries with frequencies can be also used for finding this kind of (deep) contingency tensors (not just some shallow tables):

@dsTitanic
andthen $_.map(*<passengerSurvival passengerSex passengerClass>)
andthen .&trie-create
andthen .form

TRIEROOT => 1309
├─died => 809
│ ├─female => 127
│ │ ├─1st => 5
│ │ ├─2nd => 12
│ │ └─3rd => 110
│ └─male => 682
│   ├─1st => 118
│   ├─2nd => 146
│   └─3rd => 418
└─survived => 500
  ├─female => 339
  │ ├─1st => 139
  │ ├─2nd => 94
  │ └─3rd => 106
  └─male => 161
    ├─1st => 61
    ├─2nd => 25
    └─3rd => 75

Remark: This application of Tries with frequencies can be leveraged in making mosaic plots. (See this MosaicPlot implementation in Wolfram Language, [AAp6].)


Data wrangling code with multiple languages and packages

Let us demonstrate the rapid specification of workflows application by generating data wrangling code from natural language commands. Here is a natural language workflow spec (each row corresponds to a pipeline segment):

my $commands = q:to/END/;
use dataset dfTitanic;
rename columns passengerAge as age, passengerSex as sex, passengerClass as class;
filter by age ≥ 10;
group by 'class' and 'sex';
counts;
END

Grammar based interpreters

Here is a table with the generated codes for different programming languages according to the spec above (using “DSL::English::DataQueryWorkflows”, [AAp3]):

#% html
my @tbl = <Python R Raku WL>.map({ %( language => $_, code => ToDSLCode($commands, format=>'code', target => $_) ) });
to-html(@tbl, field-names => <language code>, align => 'left').subst("\n", '<br>', :g)

Executing the Raku pipeline (by replacing dfTitanic with @dsTitanic first):

my $obj = @dsTitanic;
$obj = rename-columns( $obj, %("passengerAge" => "age", "passengerSex" => "sex", "passengerClass" => "class") ) ;
$obj = $obj.grep({ $_{"age"} >= 10 }).Array ;
$obj = group-by($obj, ("class", "sex")) ;
$obj = $obj>>.elems

{1st.female => 132, 1st.male => 149, 2nd.female => 96, 2nd.male => 149, 3rd.female => 132, 3rd.male => 332}

That is not monadic, of course — see the monadic version above.


LLM generated (via DSL examples)

Here we define an LLM-examples function for translation of natural language commands into code using DSL examples (provided by “DSL::Examples”, [AAp6]):

my sub llm-pipeline-segment($lang, $workflow-name = 'DataReshaping') { llm-example-function(dsl-examples(){$lang}{$workflow-name}) };

Here is the LLM translated code:

my $code = llm-pipeline-segment('Raku', 'DataReshaping')($commands)

use Data::Reshapers; use Data::Summarizers; use Data::TypeSystem
my $obj = @dfTitanic;
$obj = rename-columns($obj, %(passengerAge => 'age', passengerSex => 'sex', passengerClass => 'class'));
$obj = $obj.grep({ $_{'age'} >= 10 }).Array;
$obj = group-by($obj, ('class', 'sex'));
$obj = $obj>>.elems;

Here the translated code is turned into monadic code by string manipulation:

my $code-mon =$code.subst(/ $<lhs>=('$' \w+) \h+ '=' \h+ (\S*)? $<lhs> (<-[;]>*) ';'/, {"==> \{{$0}\$_{$1} \}()"} ):g;
$code-mon .= subst(/ $<printer>=[note|say] \h* $<lhs>=('$' \w+) ['>>'|»] '.elems' /, {"==> \{$<printer> \$_>>.elems\}()"}):g;

use Data::Reshapers; use Data::Summarizers; use Data::TypeSystem
my $obj = @dfTitanic;
==> {rename-columns($_, %(passengerAge => 'age', passengerSex => 'sex', passengerClass => 'class')) }()
==> {$_.grep({ $_{'age'} >= 10 }).Array }()
==> {group-by($_, ('class', 'sex')) }()
==> {$_>>.elems }()

Remark: It is believed that the string manipulation shown above provides insight into how and why monadic pipelines make imperative code simpler.


Recommendation pipeline

Here is a computational specification for creating a recommender and obtaining a profile recommendation:

my $spec = q:to/END/;
create from @dsTitanic; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsTitanic on "id";
echo the pipeline value;
END

Here is the Raku code for that spec given as an HTML code snipped with code-highlights:

#%html
ToDSLCode($spec, default-targets-spec => 'Raku', format => 'code')
andthen .subst('.', "\n.", :g)
andthen hilite($_)

my $obj = ML::SparseMatrixRecommender
.new
.create-from-wide-form(@dsTitanic)
.apply-term-weight-functions(global-weight-func => "IDF", local-weight-func => "None", normalizer-func => "Cosine")
.recommend-by-profile(["passengerSex:male", "passengerClass:1st"])
.join-across(@dsTitanic, on => "id" )
.echo-value()

Here we execute a slightly modified version of the pipeline (based on “ML::SparseMatrixRecommender”, [AAp7]):

my $obj = ML::SparseMatrixRecommender.new
.create-from-wide-form(@dsTitanic)
.apply-term-weight-functions("IDF", "None", "Cosine")
.recommend-by-profile(["passengerSex:male", "passengerClass:1st"])
.join-across(@dsTitanic, on => "id" )
.echo-value(as => {to-pretty-table($_, )} )

+----------------+-----+--------------+-------------------+----------+--------------+
| passengerClass |  id | passengerAge | passengerSurvival |  score   | passengerSex |
+----------------+-----+--------------+-------------------+----------+--------------+
|      1st       |  10 |      70      |        died       | 1.000000 |     male     |
|      1st       | 101 |      50      |      survived     | 1.000000 |     male     |
|      1st       | 102 |      40      |        died       | 1.000000 |     male     |
|      1st       | 107 |      -1      |        died       | 1.000000 |     male     |
|      1st       |  11 |      50      |        died       | 1.000000 |     male     |
|      1st       | 110 |      40      |      survived     | 1.000000 |     male     |
|      1st       | 111 |      30      |        died       | 1.000000 |     male     |
|      1st       | 115 |      20      |        died       | 1.000000 |     male     |
|      1st       | 116 |      60      |        died       | 1.000000 |     male     |
|      1st       | 119 |      -1      |        died       | 1.000000 |     male     |
|      1st       | 120 |      50      |      survived     | 1.000000 |     male     |
|      1st       | 121 |      40      |      survived     | 1.000000 |     male     |
+----------------+-----+--------------+-------------------+----------+--------------+


Functional parsers (multi-operation pipelines)

In can be said that the package “FunctionalParsers”, [AAp4], implements multi-operator monadic pipelines for the creation of parsers and interpreters. “FunctionalParsers” achieves that using special infix implementations.

use FunctionalParsers :ALL;
my &p1 = {1} ⨀ symbol('one');
my &p2 = {2} ⨀ symbol('two');
my &p3 = {3} ⨀ symbol('three');
my &p4 = {4} ⨀ symbol('four');
my &pH = {10**2} ⨀ symbol('hundred');
my &pT = {10**3} ⨀ symbol('thousand');
my &pM = {10**6} ⨀ symbol('million');
sink my &pNoun = symbol('things') ⨁ symbol('objects');

Here is a parser — all three monad operations (⨁, ⨂, ⨀) are used:

# Parse sentences that have (1) a digit part, (2) a multiplier part, and (3) a noun
my &p = (&p1 ⨁ &p2 ⨁ &p3 ⨁ &p4) ⨂ (&pT ⨁ &pH ⨁ &pM) ⨂ &pNoun;

# Interpreter:
# (1) flatten the parsed elements
# (2) multiply the first two elements and make a sentence with the third element
sink &p = { "{$_[0] * $_[1]} $_[2]"} ⨀ {.flat} ⨀ &p 

Here the parser is applied to different sentences:

['three million things', 'one hundred objects', 'five thousand things']
andthen .map({ &p($_.words.List).head.tail })
andthen (.say for |$_)

3000000 things
100 objects
Nil

The last sentence is not parsed because the parser &p knows only the digits from 1 to 4.


References

Articles, blog posts

[Wk1] Wikipedia entry: Monad (functional programming), URL: https://en.wikipedia.org/wiki/Monad_(functional_programming) .

[Wk2] Wikipedia entry: Monad transformer, URL: https://en.wikipedia.org/wiki/Monad_transformer .

[H1] Haskell.org article: Monad laws, URL: https://wiki.haskell.org/Monad_laws.

[SH2] Sheng Liang, Paul Hudak, Mark Jones, “Monad transformers and modular interpreters”, (1995), Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. New York, NY: ACM. pp. 333–343. doi:10.1145/199448.199528.

[PW1] Philip Wadler, “The essence of functional programming”, (1992), 19’th Annual Symposium on Principles of Programming Languages, Albuquerque, New Mexico, January 1992.

[RW1] Hadley Wickham et al., dplyr: A Grammar of Data Manipulation, (2014), tidyverse at GitHub, URL: https://github.com/tidyverse/dplyr .
(See also, http://dplyr.tidyverse.org .)

[AA1] Anton Antonov, “Monad code generation and extension”, (2017), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, “Monad laws in Raku”, (2025), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “Day 2 – Doing Data Science with Raku”, (2025), Raku Advent Calendar at WordPress.

Packages, paclets

[AAp1] Anton Antonov, MonadMakers, Wolfram Language paclet, (2023), Wolfram Language Paclet Repository.

[AAp2] Anton Antonov, StatStateMonadCodeGeneratoreNon, R package, (2019-2024),
GitHub/@antononcube.

[AAp3] Anton Antonov, DSL::English::DataQueryWorkflows, Raku package, (2020-2024),
GitHub/@antononcube.

[AAp5] Anton Antonov, ML::TriesWithFrequencies, Raku package, (2021-2024),
GitHub/@antononcube.

[AAp6] Anton Antonov, DSL::Examples, Raku package, (2024-2025),
GitHub/@antononcube.

[AAp7] Anton Antonov, ML::SparseMatrixRecommender, Raku package, (2025),
GitHub/@antononcube.

[AAp8] Anton Antonov, MosaicPlot, Wolfram Language paclet, (2023), Wolfram Language Paclet Repository.

Videos

[AAv1] Anton Antonov, Monadic Programming: With Application to Data Analysis, Machine Learning and Language Processing, (2017), Wolfram Technology Conference 2017 presentation. YouTube/WolframResearch.

[AAv2] Anton Antonov, Raku for Prediction, (2021), The Raku Conference 2021.

[AAv3] Anton Antonov, Simplified Machine Learning Workflows Overview, (2022), Wolfram Technology Conference 2022 presentation. YouTube/WolframResearch.

[AAv4] Anton Antonov, Simplified Machine Learning Workflows Overview (Raku-centric), (2022), Wolfram Technology Conference 2022 presentation. YouTube/@AAA4prediction.

[AAv5] Anton Antonov, Applications of Monadic Programming, Part 1, Questions & Answers, (2025), YouTube/@AAA4prediction.

Day 8 – HARC The Herald Angels Sing

Rudoph had long wanted wanted to write a website – he longed to share his hobbies and opinions with all the children, so that they wouldn’t just think of him as a first class pilot and navigator. He knew about Raku and he had skim read some information about Cro and Humming-Bird. But, being quite lazy, he wanted something very, very easy that he could use to whip up a site in a few lines.

He had overheard Dasher and Vixen talking behind the bike shed about a new Raku web authoring tool – HARC – and that sounded more in tune with his thinking.

HARC! the herald angels sing,
“Glory to the newborn King:
peace on earth, and mercy mild,
God and sinners reconciled!”

First Footing

He gave it a go, following the Getting Started info in the Raku Air::Examples module, for Air is the glue that puts the A in HARC.

zef install --/test cro Air
git clone https://github.com/librasteve/Air-Examples.git
cd Air-Examples/bin
raku 00-nano.raku

He pointed a browser at localhost:3000 and his nose lit up!

Pawing the Snow

He made a copy of 00-nano.raku and renamed it 20-rudolph.raku, then he added his name in the obvious place:

#!/usr/bin/env raku

use Air::Functional :BASE;
use Air::Base;

my $nano =
    site
        page
            main
                p "Yo rudi!";

$nano.serve;

His hooves typed raku 20-rudolph.raku and here’s what he saw in his browser

Oh my sweet Santa Claus, that’s a whole webpage (a whole website actually) in 5 lines of code. That going to save a lot of effort and bring back the -Ofun to web development.

Editors Note: Rudolph feels that HTML template systems such as Cro template or Template::Mustache or Template6 (there are many more on raku.land) are a very good idea when there is a big project with many unskilled young elves who can update web templates with little knowledge of real coding languages. However, this does not apply to an experienced reindeer like him who wants all the power of a fully featured programming language and to avoid faffing around with all those angle brackets.

Walking On

His hooves began to clack away on the keyboard:

#!/usr/bin/env raku

use Air::Functional :BASE;
use Air::Base;

my $rudi = site page main [
    section [
        h2 'About Me';
        p 'Hello! I\'m Rudolph, a curious builder who loves working on small tools, playful experiments, and simple things that make life easier. I enjoy long walks, warm drinks, and the feeling of figuring something out after staring at it way too long.';
    ];
    section [
        h2 'Projects';
        ul [
            li [ strong 'ChimeBox:'' a tiny notification app that whispers instead of buzzes.' ];
            li [ strong 'TrailMapper:'' a map tool for discovering quiet paths around my city.' ];
            li [ strong 'CookieCrunch:'' a deliberately pointless game about collecting virtual cookies.' ];
        ];
    ];
    section [
        h2 'Contact';
        p 'If you\'d like to say hello, send a message via ', em 'rudolph@example.com';
    ];
];

$rudi.serve;

Hmmm – a neat way to set out the contact right there in functional style Raku source (using Air::Functional).

All the content is done, but the style is a bit so-so…

A Rising Trot

Let’s change those section to article tags (for our Rudi has checked the PicoCSS preset semantic tags) and add a splash of colour in the footer:

#!/usr/bin/env raku

use Air::Functional :BASE;
use Air::Base;

my $rudi = site page [
    header [
        h1 'Rudolph';
        p 'Developer • Tinkerer • Occasional Cookie Enthusiast';
    ];
    main [
        article [
            h2 'About Me';
            p 'Hello! I\'m Rudolph, a curious builder who loves working on small tools, playful experiments, and simple things that make life easier. I enjoy long walks, warm drinks, and the feeling of figuring something out after staring at it way too long.';
        ];
        article [
            h2 'Projects';
            ul [
                li [ strong 'ChimeBox:'' a tiny notification app that whispers instead of buzzes.' ];
                li [ strong 'TrailMapper:'' a map tool for discovering quiet paths around my city.' ];
                li [ strong 'CookieCrunch:'' a deliberately pointless game about collecting virtual cookies.' ];
            ];
        ];
        article [
            h2 'Contact';
            p 'If you\'d like to say hello, send a message via ', b 'rudolph@example.com';
        ];
    ];
    footer
        p [safe '&copy; 2025'; b 'Rudolph.''All rights reserved.' ];
];

$rudi.serve;

Very dashing splash of red:

Editor’s Note: Pico CSS already defines a coherent set of styles for all the tags used so far … so no need to decorate our content code with e.g. Tailwind (unless you want to).

But his new header was pretty so-so…

At a Gallop

It was a short job to curry the Air::Base header routine with one of his own:

my &rude-header = &header.assuming(
    :style( q:to/END/;
        background: #b30000;
        color: white;
        padding: 2rem;
        text-align: center;
    END
    ),
);

And voila:

Stable End

Rudolph looked on with quiet satisfaction at his work and fired up his holly-wood churchwarden pipe.

Rudolph, bright and cozy,
amid the tinsel light,
puffs upon a churchwarden pipe
that glows like Yuletide night.

Find out if things can get better next time…

~librasteve


Day 7 – Allowing for fewer dollars

Lizzybel had been taking a bit of vacation from all of the busy-ness in the corridors of North Pole Grand Central.

While doing a small visit to the corridors, she ran into Nanunanu, one of the IT elves.  Nanunanu was a bit worried, because they had not seen Lizzybel for a while. “Don’t worry”, said Lizzybel. “I’m just recharching my batteries a bit while doing some other stuff that has been neglected by me for a while. But I have been following developments from a distance, to stay at least a bit in the loop”, Lizzybel said with a bit of a grin. “Ah, ok”, said Nanunanu, “anything particular that caught your eye?”.

“Now that you mention it: it looks like quite a few of potential users of the Raku Programming Language are put off by the use of sigils in variable declarations, specifically the $“, said Lizzybel while taking out her phone and showing a HackerNews comment to Nanunanu.

“What a silly reason to not want to look deeper into Raku”.  Nanunanu agreed and went on their way because busy, busy, busy!

While going home, Lizzybel was thinking: “On the other hand, it was clear that this was about first impressions. And first impressions are important. So because of this first impression, the Raku Programming Language was potentially missing out on a significant number of new users! What a pity!”

A constant

When back home, Lizzybel thought: “But the Raku Programming Language is no stranger to sigilless constants”

my constant answer = 42;

“is but an example”. “And with a little trick, you could even make sigilless variables”, she was mumbling to herself:

my \answer = my $;
answer = 42;
say answer;  # 42

But that is really yucky. And would not help with a first impression of the Raku Programming Language, at all!

Emojional

Then she was reminded of a playful module she’d made several years ago: Slang::Emoji. It allowed one to define and use variables whose name was a single character emoji, such as 👍 or 🏳️‍🌈:

use Slang::Emoji;
my 👍 = 42;
say 👍;  # 42

To make this possible, she remembered that she had actually sneaked in a special token into the grammar of the Raku programming Language to be able to do this: sigilless-variable. Maybe that token could be used to create sigilless variables in Raku as well?

Nogil

Turns out there had been a Raku slang for sigilless variables already by Martin Tourneboeuf. But sadly that had bitrotted. “Why not use that namespace?”, Lizzybel thought to herself. “Indeed, why not?”. The initial iteration of transmogrifying the Slang::Emoji module into Slang::Nogil looked simple enough. Just replace <.:So> with <.ident>+, and add a check that we’re actually in a definition ($*IN_SPEC), and voila: Slang::Nogil1.1.

use Slang::Nogil;
my answer = 42;
say answer;  # 42

And fortunately, Martin Tourneboeuf was happy with the result.

All was good, but then some issues started to become clear(er).

Nogil vs Emoji

Because both Slang::Emoji and Slang::Nogil mix in a new version of the “sigilless-variable” token, one module was trampling on the other.  Lizzybel realized that a solution in which a mixed-in token would just re-dispatch to the original token, would be the best solution. But alas, after about two days of hacking, it turned out to be still impossible do so in a transparent manner.

So the next best thing was to integrate the Slang::Emoji functionality into Slang::Nogil: an emoji could be considered to be a sigilless identifier after all, could it not?

The result was Slang::Nogil 1.2.

Not enough testing: more trouble

Lizzybel had only tested the most simple cases. But not something like my Int answer = 42. Which fails with a “Two terms in a row” error. Or something even worse, that would affect a lot of code in the wild: my sub a() { }, which also would fail in the same way.

Clearly the “sigilless-variable” approach would either require a more general approach in the Raku grammar, or would involve some serious ad-hoc workaround hacking in the Slang::Nogil module.

Because it was nearly Xmas, Lizzybel opted for the ad-hoc workaround hacking approach for now. “At least people would be able to play around with the use of sigilless variables in Raku, which some people would consider a nice Xmas present” was Lizzybel‘s line of thought.

And after some hackingSlang::Nogil 1.3 saw the light of day.

Not always available, or?

Nanunanu found out about the latest update of Slang::Nogil and enthusiastically send a private message to Lizzybel on IRC: “Very nice, I always wanted to be able to not have to use sigils for variables with limited scopes. And now I can! But I would still always would need to load the Slang::Nogil module in my code, no?”.

Lizzybel answered: “Yes, at the moment you would have to. But fortunately, you can automate that as well with the RAKUDO_OPT environment variable. Just put RAKUDO_OPT=-MSlang::Nogil in your environment, and you don’t need to think about it anymore!”. It was silent on the other end. But that was just because Nanunanu was also busy with something else.

After a few minutes Nanunanu answered: “That’s pretty cool, didn’t know you could do that :-). But of course it would be nicer still if it was just part of Raku, wouldn’t it?”.

A good question

“Should this be part of Raku, perhaps in the next language level?”, wondered Lizzybel. “And should this only apply to variable definitions? Or also to signatures, so you would be able to do something like for <a b c d> -> letter { say letter }. Or would that affect error reporting on common errors too much? Or would we be able to change the grammar and error reporting in such a way that sigilless identifiers in signatures would not be a problem after all?”.

“Perhaps it is time for a language problem solving issue. And an associated Rakudo Pull Request“, thought Lizzybel. “But not now, as I’m still recharging my batteries”.

Day 6 – Robust code generation combining grammars and LLMs

Introduction

This document (notebook) discusses different combinations of Grammar-Based Parser-Interpreters (GBPI) and Large Language Models (LLMs) to generate executable code from Natural Language Computational Specifications (NLCM). We have the soft assumption that the NLCS adhere to a certain relatively small Domain Specific Language (DSL) or use terminology from that DSL. Another assumption is that the target software packages are not necessarily well-known by the LLMs, i.e. direct LLM requests for code using them would produce meaningless results.

We want to do such combinations because:

  • GBPI are fast, precise, but with a narrow DSL scope
  • LLMs can be unreliable and slow, but with a wide DSL scope

Because of GBPI and LLMs are complementary technologies with similar and overlapping goals the possible combinations are many. We concentrate on two of the most straightforward designs: (1) judged parallel race of methods execution, and (2) using LLMs as a fallback method if grammar parsing fails. We show asynchronous programming implementations for both designs using the package LLM::Graph.

The Machine Learning (ML) package “ML::SparseMatrixRecommender” is used to demonstrate that the generated code is executable.

The rest of the document is structured as follows:

  • Initial grammar-LLM combinations
    • Assumptions, straightforward designs, and trade-offs
  • Comprehensive combinations enumeration (attempt)
    • Tabular and morphological analysis breakdown
  • Three methods for parsing ML DSL specs into Raku code
    • One grammar-based, two LLM-based
  • Parallel execution with an LLM judge
    • Straightforward, but computationally wasteful and expensive
  • Grammar-to-LLM fallback mechanism
    • The easiest and most robust solution
  • Concluding comments and observations

TL;DR

  • Combining grammars and LLMs produces robust translators.
  • Three translators with different faithfulness and coverage are demonstrated and used.
  • Two of the simplest, yet effective, combinations are implemented and demonstrated.
    • Parallel race and grammar-to-LLM fallback.
  • Asynchronous implementations with LLM-graphs are a very good fit!
    • Just look at the LLM-graph plots (and be done reading.)

Initial Combinations and Associated Assumptions

The goal is to combine the core features of Raku with LLMs in order to achieve robust parsing and interpretation of computational workflow specifications.

Here are some example combinations of these approaches:

  1. A few methods, both grammar-based and LLM-based, are initiated in parallel. Whichever method produces a correct result first is selected as the answer.
    • This approach assumes that when the grammar-based methods are effective, they will finish more quickly than the LLM-based methods.
  2. The grammar method is invoked first; if it fails, an LLM method (or a sequence of LLM methods) is employed.
  3. LLMs are utilized at the grammar-rule level to provide matching objects that the grammar can work with.
  4. If the grammar method fails, an LLM normalizer for user commands is invoked to generate specifications that the grammar can parse.
  5. It is important to distinguish between declarative specifications and those that prescribe specific steps.
    • For a workflow given as a list of steps the grammar parser may successfully parse most steps, but LLMs may be required for a few exceptions.

The main trade-off in these approaches is as follows:

  • Grammar methods are challenging to develop but can be very fast and precise.
    • Precision can be guaranteed and rigorously tested.
  • LLM methods are quicker to develop but tend to be slower and can be unreliable, particularly for less popular workflows, programming languages, and packages.

Also, combinations based on LLM tools (aka LLM external function calling) are not considered because LLM-tools invocation is too unpredictable and unreliable.


Comprehensive breakdown (attempt)

This section has a “concise” table that expands the combinations list above into the main combinatorial strategies for Raku core features × LLMs for robust parsing and interpretation of workflow specifications. The table is not an exhaustive list of such combinations, but illustrates their diversity and, hopefully, can give ideas for future developments.

A few summary points (for table’s content/subject):

  • Grammar (Raku regex/grammar)
    • Pros: fast, deterministic, validated, reproducible
    • Cons: hard to design for large domains, brittle for natural language inputs
  • LLMs
    • Pros: fast to prototype, excellent at normalization/paraphrasing, flexible
    • Cons: slow, occasionally wrong, hallucination risk, inconsistent output formats
  • Conclusion:
    • The most robust systems combine grammar precision with LLM adaptability, typically by putting grammars first and using LLMs for repair, normalization, expansions, or semantic interpretation (i.e. “fallback”.)

Table: Combination Patterns for Parsing Workflow Specifications

Combination PatternDescriptionProsCons / Trade-offs
Parallel Race: Grammar + LLMLaunch Raku grammar parsing and one or more LLM interpreters in parallel; whichever yields a valid parse first is accepted.• Fast when grammar succeeds
• Robust fallback
• Reduces latency unpredictability of LLMs
• Requires orchestration
• Need a validator for LLM output
Grammar-First, LLM-FallbackTry grammar parser first; if it fails, invoke LLM-based parsing or normalization.• Deterministic preference for grammar
• Testable correctness when grammar succeeds
• LLM fallback may produce inconsistent structures
LLM-Assisted Grammar (Rule-Level)Individual grammar rules delegate to an LLM for ambiguous or context-heavy matching; LLM supplies tokens or AST fragments.• Handles complexity without inflating grammar
• Modular LLM usage
• LLM behavior may break rule determinism
• Harder to reproduce
LLM Normalizer → Grammar ParserWhen grammar fails, LLM rewrites/normalizes input into a canonical form; grammar is applied again.• Grammar remains simple
• Leverages LLM skill at paraphrasing
• Quality depends on normalizer
• Feedback loops possible
Hybrid Declarative vs Procedural ParsingGrammar extracts structural/declarative parts; LLM interprets procedural/stepwise parts or vice versa.• Each subsystem tackles what it’s best at
• Reduces grammar complexity
• Harder to maintain global consistency
• Requires AST stitching logic
Grammar-Generated Tests for LLMGrammar used to generate examples and counterexamples; LLM outputs are validated against grammar constraints.• Powerful for verifying LLM outputs
• Reduces hallucinations
• Grammar must encode constraints richly
• Validation pipeline required
LLM as Adaptive Heuristic for Grammar AmbiguitiesWhen grammar yields multiple parses, LLM chooses or ranks the “most plausible” AST.• Improves disambiguation
• Good for underspecified workflows
• LLM can pick syntactically impossible interpretations
LLM as Semantic Phase After GrammarGrammar creates an AST; LLM interprets semantics, fills in missing steps, or resolves vague ops.• Clean separation of syntax vs semantics
• Grammar ensures correctness
• Semantic interpretation may drift from syntax
Self-Healing Parse LoopGrammar fails → LLM proposes corrections → grammar retries → if still failing, LLM creates full AST.• Iterative and robust
• Captures user intent progressively
• More expensive; risk of oscillation
Grammar Embedding Inside Prompt TemplatesRaku grammar definitions serialized into the prompt, guiding the LLM to conform to the grammar (soft constraints).• Faster than grammar execution in some cases
• Encourages consistent structure
• Weak guarantees
• LLM may ignore grammar
LLM-Driven Grammar Induction or RefinementLLM suggests new grammar rules or transformations; developer approves; Raku grammar evolves over time.• Faster grammar evolution
• Useful for new workflow languages
• Requires human QA
• Risk of regressing accuracy
Raku Regex Engine as LLM GuardrailRaku regex or token rules used to validate or filter LLM results before accepting them.• Lightweight constraints
• Useful for quick prototyping
• Regex too weak for complex syntax

Diversity reasons

  • The diversity of combinations in the table above arises because Raku grammars and LLMs occupy fundamentally different but highly complementary positions in the parsing spectrum.
  • Raku grammars provide determinism, speed, verifiability, and structural guarantees, but they require upfront design and struggle with ambiguity, informal input, and evolving specifications.
  • LLMs, in contrast, excel at normalization, semantic interpretation, ambiguity resolution, and adapting to fluid or poorly defined languages, yet they lack determinism, may hallucinate, and are slower.
  • When these two technologies meet, every architectural choice — who handles syntax, who handles semantics, who runs first, who validates whom, who repairs or refines — defines a distinct strategy.
  • Hence, the design space naturally expands into many valid hybrid patterns rather than a single “best” pipeline.
  • That said, the fallback pattern implemented below can be considered the “best option” from certain development perspectives because it is simple, effective, and has fast execution times.

See the corresponding Morphological Analysis table which correspond to this taxonomy mind-map:


Setup

Here are the packages used in this document (notebook):

use DSL::Translators;
use DSL::Examples;
use ML::NLPTemplateEngine;

use LLM::Graph;

Here are LLM-models access configurations:

my $conf41-mini = llm-configuration('ChatGPT', model => 'gpt-4.1-mini', temperature => 0.45);
my $conf41 = llm-configuration('ChatGPT', model => 'gpt-4.1', temperature => 0.45);
my $conf51 = llm-configuration('ChatGPT', model => 'gpt-5.1', reasoning-effort => 'none');
my $conf-gemini20-flash = llm-configuration('Gemini', model => 'gemini-2.0-flash');

Three DSL translations

This section demonstrates the use of three different translation methods:

  1. Grammar-based parser-interpreter of computational workflows
  2. LLM-based translator using few-shot learning with relevant DSL examples
  3. Natural Language Processing (NLP) interpreter using code templates and LLMs to fill-in the corresponding parameters

The translators are ordered according of their faithfulness, most faithful first.
It can be said that at the same time, the translators are ordered according to their coverage — widest coverage is by the last.

Grammar-based

Here a recommender pipeline specified with natural language commands is translated into Raku code of the package “ML::SparseMatrixRecommender” using a sub of the package “DSL::Translators”:

'
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value
'
==> ToDSLCode(to => 'Raku', format => 'CODE')
==> {.subst('.', "\n."):g}()

# my $obj = ML::SparseMatrixRecommender
# .new
# .create-from-wide-form(@dsData)
# .apply-term-weight-functions(global-weight-func => "IDF", local-weight-func => "None", normalizer-func => "Cosine")
# .recommend-by-profile(["passengerSex:male", "passengerClass:1st"])
# .join-across(@dsData, on => "id" )
# .echo-value()

For more details of the grammar-based approach see the presentations:

Via LLM examples

LLM translations can be done using a set of from-to rules. This is the so-called few shot learning of LLMs. The package “DSL::Examples” has a collection of such examples for different computational workflows. (Mostly ML at this point.) The examples are hierarchically organized by programming language and workflow name; see the resource file “dsl-examples.json”, or execute dsl-examples.

Here is a table that shows the known DSL translation examples in “DSL::Examples”:

#% html
dsl-examples().map({ $_.key X ($_.value.keys Z $_.value.values».elems) }).flat(1).map({ <language workflow examples-count> Z=> $_.flat })».Hash.sort(*<language workflow>).Array
==> to-dataset()
==> to-html(field-names => <language workflow examples-count>)

languageworkflowexamples-count
PythonLSAMon15
PythonQRMon23
PythonSMRMon20
RLSAMon17
RQRMon26
RSMRMon20
RakuSMRMon20
WLClCon20
WLLSAMon17
WLQRMon27
WLSMRMon20

Here is the definition of an LLM translation function that uses examples:

my &llm-pipeline-segment = llm-example-function(dsl-examples()<Raku><SMRMon>);

Here is a recommender pipeline specified with natural language commands:

my $spec = q:to/END/;
new recommender;
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
classify by profile passengerSex:female, and passengerClass:1st on the tag passengerSurvival;
echo value
END

sink my @commands = $spec.lines;

Translate to Raku code line-by-line:

@commands
.map({ .&llm-pipeline-segment })
.map({ .subst(/:i Output \h* ':'?/, :g).trim })
.join("\n.")

# ML::SparseMatrixRecommender.new;
# .create(@dsData)
# .apply-term-weight-functions('IDF', 'None', 'Cosine')
# .recommend-by-profile({'passengerSex.male' => 1, 'passengerClass.1st' => 1})
# .join-across(@dsData, on => 'id')
# .echo-value()
# .classify-by-profile('passengerSurvival', {'passengerSex.female' => True, 'passengerClass.1st' => True})
# .echo-value()

Or translate by just calling the function over the whole $spec:

&llm-pipeline-segment($spec)

# ML::SparseMatrixRecommender.new;  
# create(@dsData);  
# apply-term-weight-functions('IDF', 'None', 'Cosine');  
# recommend-by-profile({'passengerSex.male' => 1, 'passengerClass.1st' => 1});  
# join-across(@dsData, on => 'id');  
# echo-value();  
# classify-by-profile('passengerSurvival', [{'passengerSex.female' => 1, 'passengerClass.1st' => 1}]);  
# echo-value()

Remark: That latter call is faster, but it needs additional processing for “monadic” workflows.

By NLP Template Engine

Here a “free text” recommender pipeline specification is translated to Raku code using the sub concretize of the package “ML::NLPTemplateEngine”:

'create a recommender with dfTitanic; apply the LSI functions IDF, None, Cosine; recommend by profile 1st and male'
==> concretize(lang => "Raku", e => $conf41-mini)

# my $smrObj = ML::SparseMatrixRecommender.new
# .create-from-wide-form(dfTitanic, item-column-name='id', :add-tag-types-to-column-names, tag-value-separator=':')
# .apply-term-weight-functions('IDF', 'None', 'Cosine')
# .recommend-by-profile(["1st"], 12, :!normalize)
# .join-across(dfTitanic)
# .echo-value();

The package “ML::NLPTemplateEngine” uses a Question Answering System (QAS) implemented in “ML::FindTextualAnswer”. A QAS can be implemented in different ways, with different conceptual and computation complexity. Currently, “ML::FindTextualAnswer” has only an LLM based implementation of QAS.

For more details of the NLP template engine approach see the presentations:


Parallel race (judged): Grammar + LLM

In this section we implement the first, most obvious, and conceptually simplest combination of grammar-based- with LLM-based translations:

  • All translators — grammar-based and LLM-based are run in parallel
  • An LLM judge selects the one that adheres best to the given specification

The implementation of this strategy with an LLM graph (say, by using “LLM::Graph”) is straightforward.

Here is such an LLM graph that:

  • Runs all three translation methods above
  • There is a judge that picks which on of the methods produced best result
  • Judge’s output is used to make a Markdown report
my %rules =
    dsl-grammar => { 
        eval-function => sub ($spec, $lang = 'Raku') { ToDSLCode($spec, to => $lang, format => 'CODE') }
    },

    llm-examples => { 
        llm-function => 
            sub ($spec, $lang = 'Raku', $split = False) { 
                my &llm-pipeline-segment = llm-example-function(dsl-examples(){$lang}<SMRMon>);
                return do if $split {
                    note 'with spec splitting...';
                    my @commands = $spec.lines;
                    @commands.map({ .&llm-pipeline-segment }).map({ .subst(/:i Output \h* ':'?/, :g).trim }).join("\n.")
                } else {
                    note 'no spec splitting...';
                    &llm-pipeline-segment($spec).subst(";\n", "\n."):g
                }
            },
    },

    nlp-template-engine => {
        llm-function => sub ($spec, $lang = 'Raku') { concretize($spec, :$lang) }
    },

    judge => sub ($spec, $lang, $dsl-grammar, $llm-examples, $nlp-template-engine) {
            [
                "Choose the generated code that most fully adheres to the spec:\n",
                $spec,
                "\nfrom the following $lang generation results:\n\n",
                "1) DSL-grammar:\n$dsl-grammar\n",
                "2) LLM-examples:\n$llm-examples\n",
                "3) NLP-template-engine:\n$nlp-template-engine\n",
                "and copy it:"
            ].join("\n\n")
    },
    
    report => {
            eval-function => sub ($spec, $lang, $dsl-grammar, $llm-examples, $nlp-template-engine, $judge) {
                [
                    '# Best generated code',
                    "Three $lang code generations were submitted for the spec:",
                    '```text',
                    $spec,
                    '```',
                    'Here are the results:',
                    to-html( ['dsl-grammar', 'llm-examples', 'nlp-template-engine'].map({ [ name => $_, code => ::('$' ~ $_)] })».Hash.Array, field-names => <name code> ).subst("\n", '<br/>'):g,
                    '## Judgement',
                    $judge.contains('```') ?? $judge !! "```$lang\n" ~ $judge ~ "\n```"
                ].join("\n\n")
            }
    }        
;

my $gBestCode = LLM::Graph.new(%rules)

# LLM::Graph(size => 5, nodes => dsl-grammar, judge, llm-examples, nlp-template-engine, report)

Here is a recommender workflow specification:

my $spec = q:to/END/;
make a brand new recommender with the data @dsData;
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gBestCode.eval(:$spec, lang => 'Raku', :split)

#    with spec splitting...
#   LLM::Graph(size => 5, nodes => dsl-grammar, judge, llm-examples, nlp-template-engine, report)

Here the LLM-graph result — which is a Markdown report — is rendered:

#% markdown
$gBestCode.nodes<report><result>

Best generated code

Three Raku code generations were submitted for the spec:

make a brand new recommender with the data @dsData;
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;



Here are the results:

namecode
dsl-grammar
llm-examplesML::SparseMatrixRecommender.new(@dsData)
.apply-term-weight-functions(‘IDF’, ‘None’, ‘Cosine’)
.recommend-by-profile({‘passengerSex.male’ => 1, ‘passengerClass.1st’ => 1})
.join-across(@dsData, on => ‘id’)
.echo-value()
nlp-template-enginemy $smrObj = ML::SparseMatrixRecommender.new
.create-from-wide-form([“passengerSex:male”, “passengerClass:1st”]set, item-column-name=’id’, :add-tag-types-to-column-names, tag-value-separator=’:’)
.apply-term-weight-functions(‘IDF’, ‘None’, ‘Cosine’)
.recommend-by-profile([“passengerSex:male”, “passengerClass:1st”], 12, :!normalize)
.join-across([“passengerSex:male”, “passengerClass:1st”]set)
.echo-value();

Judgement

ML::SparseMatrixRecommender.new(@dsData)
.apply-term-weight-functions('IDF', 'None', 'Cosine')
.recommend-by-profile({'passengerSex.male' => 1, 'passengerClass.1st' => 1})
.join-across(@dsData, on => 'id')
.echo-value()

LLM-graph visualization

Here is a visualization of the LLM graph defined and run above:

#% html
$gBestCode.dot(engine => 'dot', :9graph-size, node-width => 1.7, node-color => 'grey', edge-color => 'grey', edge-width => 0.4, theme => 'default'):svg

For details on LLM-graphs making and their visualization representations see blog posts:


Fallback: DSL-grammar to LLM-examples

Instead of having DSL-grammar- and LLM computations running in parallel, we can make an LLM-graph in which the LLM computations are invoked if the DSL-grammar parsing-and-interpretation fails. In this section we make such a graph.

Before making the graph let us also generalize it to work with other ML workflows, not just recommendations. The function ToDSLCode (of the package “DSL::Translators”) has an ML workflow classifier based on prefix trees; see [AA1].

Let us make an LLM function with a similar functionality. I.e. an LLM-function that classifies a natural language computation specification into workflow labels used by “DSL::Examples”. Here is such a function using the sub llm-classify provided by “ML::FindTextualAnswer”:

# Natural language labels to be understood by LLMs
my @mlLabels = 'Classification', 'Latent Semantic Analysis', 'Quantile Regression', 'Recommendations';

# Map natural language labels to workflow names in "DSL::Examples"
my %toMonNames = @mlLabels Z=> <ClCon LSAMon QRMon SMRMon>; 

# Change the result of &llm-classify result into workflow names
my &llm-ml-workflow = -> $spec { my $res = llm-classify($spec, @mlLabels, request => 'which of these workflows characterizes it'); %toMonNames{$res} // $res };

# Example invocation
&llm-ml-workflow($spec)

# SMRMon

In addition, we have to specify a pipeline “separator” for the different programming languages:

my %langSeparator = Python => "\n.", Raku => "\n.", R => "%>%\n", WL => "⟹\n";

Here is the LLM-graph:

my %rules =
    dsl-grammar => { 
        eval-function => sub ($spec, $lang = 'Raku') { 
            my $res = ToDSLCode($spec, to => $lang, format => 'CODE'); 
            my $checkStr = 'my $obj = ML::SparseMatrixRecommender.new';
            return do with $res.match(/ $checkStr /):g { 
                $/.list.elems > 1 ?? $res.subst($checkStr) !! $res 
            }
        }
    },

    workflow-name => {
        llm-function => sub ($spec) { &llm-ml-workflow($spec) }
    },

    llm-examples => { 
        llm-function => 
            sub ($spec, $workflow-name, $lang = 'Raku', $split = False) {
                my &llm-pipeline-segment = llm-example-function(dsl-examples(){$lang}{$workflow-name});
                return do if $split {
                    my @commands = $spec.lines;
                    @commands.map({ .&llm-pipeline-segment }).map({ .subst(/:i Output \h* ':'?/, :g).trim }).join(%langSeparator{$lang})
                } else {
                    &llm-pipeline-segment($spec).subst(";\n", %langSeparator{$lang}):g
                }
            },
        test-function => sub ($dsl-grammar) { !($dsl-grammar ~~ Str:D && $dsl-grammar.trim.chars) }
    },
    
    code => {
            eval-function => sub ($dsl-grammar, $llm-examples) {
                $dsl-grammar ~~ Str:D && $dsl-grammar.trim ?? $dsl-grammar !! $llm-examples
            }
    }   
;

my $gRobust = LLM::Graph.new(%rules):!async

# LLM::Graph(size => 4, nodes => code, dsl-grammar, llm-examples, workflow-name)

Here the LLM graph is run over a spec that can be parsed by DSL-grammar (notice the very short computation time):

my $spec = q:to/END/;
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gRobust.eval(:$spec, lang => 'Raku', :!split)

# LLM::Graph(size => 4, nodes => code, dsl-grammar, llm-examples, workflow-name)

Here is the obtained result:

$gRobust.nodes<code><result>

# my $obj = ML::SparseMatrixRecommender.new.create-from-wide-form(@dsData).apply-term-weight-functions(global-weight-func => "IDF", local-weight-func => "None", normalizer-func => "Cosine").recommend-by-profile(["passengerSex:male", "passengerClass:1st"]).join-across(@dsData, on => "id" ).echo-value()

Here is a spec that cannot be parsed by the DSL-grammar interpreter — note that there is just a small language change in the first line:

my $spec = q:to/END/;
new recommender with @dsData, please; 
also apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gRobust.eval(:$spec, lang => 'Raku', :!split)

#    Cannot parse the command; error in rule recommender-object-phrase:sym<English> at line 1; target 'new recommender with @dsData, please' position 16; parsed 'new recommender', un-parsed 'with @dsData, please' .
#
#    LLM::Graph(size => 4, nodes => code, dsl-grammar, llm-examples, workflow-name)

Nevertheless, we obtain a correct result via LLM-examples:

$gRobust.nodes<code><result>

# ML::SparseMatrixRecommender.new(@dsData)
# .apply-term-weight-functions('IDF', 'None', 'Cosine')
# .recommend-by-profile({'passengerSex.male' => 1, 'passengerClass.1st' => 1})
# .join-across(@dsData, on => 'id')
# .echo-value();

Here is the corresponding graph plot:

#% html
$gRobust.dot(engine => 'dot', :9graph-size, node-width => 1.7, node-color => 'grey', edge-color => 'grey', edge-width => 0.4, theme => 'default'):svg

Let us specify another workflow — for ML-classification with Wolfram Language — and run the graph:

my $spec = q:to/END/;
use the dataset @dsData;
split the data into training and testing parts with 0.8 ratio;
make a nearest neighbors classifier;
show classifier accuracy, precision, and recall;
echo the pipeline value;
END

$gRobust.eval(:$spec, lang => 'WL', :split)

# LLM::Graph(size => 4, nodes => code, dsl-grammar, llm-examples, workflow-name)

$gRobust.nodes<code><result>

#  ClConUnit[dsData]⟹
#    ClConSplitData[0.8]⟹
#    ClConMakeClassifier["NearestNeighbors"]⟹
#    Function[{v,c},ClConUnit[v,c]⟹ClConClassifierMeasurements[{"Accuracy","Precision","Recall"}]⟹ClConEchoValue]⟹
#    ClConEchoValue


Concluding comments and observations

  • Using LLM graphs gives the ability to impose desired orchestration and collaboration between deterministic programs and LLMs.
    • By contrast, the “inversion of control” of LLM-tools is “capricious.”
  • LLM-graphs are both a generalization of LLM-tools, and a lower level infrastructural functionality than LLM-tools.
  • The LLM-graph for the parallel-race translation is very similar to the LLM-graph for comprehensive document summarization described in [AA4].
  • The expectation that DSL examples would provide both fast and faithful results is mostly confirmed in ≈20 experiments.
  • Using the NLP template engine is also fast because LLMs are harnessed through QAS.
  • The DSL examples translation had to be completed with a workflow classifier.
    • Such classifiers are also part of the implementations of the other two approaches.
    • The grammar-based one uses a deterministic classifier, [AA1].
    • The NLP template engine uses an LLM classifier.
  • An interesting extension of the current work is to have a grammar-LLM combination in which when the grammar fails then the LLM “normalizes” the specs until the grammar can parse them.
    • Currently, LLM-graph does not support graphs with cycles, hence this approach “can wait” (or be implemented by other means.)
  • Multiple DSL examples can be efficiently derived by random sentence generation with the different grammars.
    • Similar to the DSL commands classifier making approach taken in [AA1].
  • LLMs can be also used to improve and extend the DSL grammars.
    • And it is interesting to consider automating that process, instead of doing it via human supervision.

References

Articles, blog posts

[AA1] Anton Antonov, “Fast and compact classifier of DSL commands”, (2022), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Grammar based random sentences generation, Part 1”, (2023), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “LLM::Graph”, (2025), RakuForPrediction at WordPress.

[AA4] Anton Antonov, “Agentic-AI for text summarization”, (2025), RakuForPrediction at WordPress.

[AA5] Anton Antonov, “LLM::Graph plots interpretation guide”, (2025), RakuForPrediction at WordPress.

Packages

[AAp1] Anton Antonov, DSL::Translators, Raku package, (2020-2025), GitHub/antononcube.

[AAp2] Anton Antonov, ML::FindTextualAnswer, Raku package, (2023-2025), GitHub/antononcube.

[AAp3] Anton Antonov, MLP::NLPTemplateEngine, Raku package, (2023-2025), GitHub/antononcube.

[AAp4] Anton Antonov, DSL::Examples, Raku package, (2024-2025), GitHub/antononcube.

[AAp5] Anton Antonov, LLM::Graph, Raku package, (2025), GitHub/antononcube.

[AAp6] Anton Antonov, ML::SparseMatrixRecommender, Raku package, (2025), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “NLP Template Engine, Part 1”, (2021), YouTube/@AAA4prediction.

[AAv2] Anton Antonov, “Natural Language Processing Template Engine”, (2023), YouTube/@WolframResearch.

[WRIv1] Wolfram Research, Inc., “Live CEOing Ep 886: Design Review of LLMGraph”, (2025), YouTube/@WolframResearch.

Day 5 – Tools for Gnome::Gtk4

A short while ago, Santa Claus came to me for a short visit to drink a cup of tea together. That was very pleasant, but I had the idea that there was something more, knowing that he is always busy, especially these days. After some time, he came forward and said that the elves had found my distribution GnomeTools and were eager to use it. It looks so promising. However, they had problems finding any documentation about it. I had to admit that there was still a lot missing. My excuse was that the package needed a lot more classes and might also change here and there. Santa Claus said that that shouldn’t be a problem as long as I keep the version below 1 ;-). Santa said that he would like to see some examples so that his elves could do something with the classes.

Ok, here we go then …

Very short example

For the people working a lot with scripting languages, it would be nice to show some information in a kind of message dialog. This next example will show a simple way to do just that. The dialog window will disappear after pressing the OK button.

use GnomeTools::Gtk::MessageDialog;

my Str $message = Q:q:to/EOM/;
I would heavely recommend continuing
using the Raku language whatever
your plans are in the near or distant
future, because it will bring you
fortune and happyness!

EOM


my GnomeTools::Gtk::MessageDialog $message-dialog;
$message-dialog .= new(:$message);

And the result …

A Dialog

A more elaborate example is a dialog window that shows more than just text. We start by getting the necessary ingredients and a simple class with callback methods. These methods are called from the native routines. The method reads the text entry and copies this string into the status field. Both fields are in the dialog, which we will see later.

use GnomeTools::Gtk::Dialog;

use Gnome::Gtk4::Label:api<2>;
use Gnome::Gtk4::Entry:api<2>;

class helper {
  method say-hello (
    GnomeTools::Gtk::Dialog :$dialog,
    Gnome::Gtk4::Entry :$entry
  ) {
    $dialog.set-status("hello <b>$entry.get-text()\</b>");
  }
}

Then we create a header text for the dialog and a text entry field with a placeholder text instructing you what to do.

After that, the dialog is created with a title and a header. A statusbar is added below the button row.

In this dialog window, we add content. It always consists of a text line on the left and a widget to its right. In this case, a text entry. You can add as many rows as you like.

Then, buttons are added. One ‘Help’ button which, when clicked, calls the say-hello() routine defined earlier. The other is to tear down the dialog window.

my Str $dialog-header = Q:a:to/EOHEADER/;
This is a small test to show a dialog with an entry
and a few buttons. The <b>Hello</b> button shows some
text in the statusbar when pressed. The <b>Cancel</b>
button stops the program.

EOHEADER


with my Gnome::Gtk4::Entry $entry .= new-entry {
  .set-placeholder-text(
    'Text shows up after pressing Hello'
  );
  .set-size-request( 400, -1);
}

with my GnomeTools::Gtk::Dialog $dialog .= new(
  :$dialog-header, :dialog-title('Test Dialog'),
  :add-statusbar
) {
  .add-content( 'Please enter your name', $entry);
  .add-button(
    helper.new, 'say-hello', 'Hello', :$dialog, :$entry
  );
  .add-button( $dialog, 'destroy-dialog', 'Cancel');
  .show-dialog;
}

And when I run the program, it shows up as

Add CSS

We can use CSS to pimp up the display. To do this, we must import GnomeTools::Gtk::Theming at the start of the program.

Then add some code in the block where the dialog window is created.

with my GnomeTools::Gtk::Dialog $dialog .= new(
  :$dialog-header, :dialog-title('Test Dialog'),
  :add-statusbar
) {
  # … content and buttons …

  my Str $css-text = Q:q:to/EOCSS/;
.dialog-tool {
background-color: #afafaf;
}

.dialog-header {
color:rgb(59, 1, 65);
padding-left: 15px;
padding-right: 15px;
}

.dialog-content label {
color: #004060;
}

.dialog-button label {
color:rgb(15, 165, 240);
}

.statusbar-tool {
background-color:rgb(84, 10, 85);
border-width: 5px;
border-style: groove;
border-color:rgb(144, 0, 255);
}

.statusbar-tool > label {
color:rgb(0, 0, 90);
}

.dialog-entry {
border-width: 5px;
border-style: inset;
border-color:rgb(144, 0, 255);
color:rgb(255, 141, 141);
}

EOCSS


  my GnomeTools::Gtk::Theming $theme .= new(:$css-text);
  $theme.add-css-class( $entry, 'dialog-entry');

Nice, isn’t it! The CSS classes dialog-tool, dialog-header, dialog-content, and dialog-button are defined in the GnomeTools::Gtk::Dialog class. The CSS class statusbar-tool is defined in the GnomeTools::Gtk::Statusbar class. In the program, we added the class dialog-entry. Note, however, the CSS used by Gnome is not exactly the same as one might be used to, You can find good information here.

Note that the call to .show-dialog() does not return until the cancel button is clicked, which destroys the dialog window. So, for the shell scripting elves, we could add a line just to the end, like so;

  .show-dialog;
}

say $entry.get-text

This gives us the possibility to return information into a shell script. For example, when this program is stored in file xt/dialog.rakutest;

echo Hello `raku xt/dialog.rakutest`

Installing and other info

You can install this distribution with zef

> zef install GnomeTools

Reference information on Gnome::* modules can be found here.

Day 4 – Gift yourself a merry little PDF journal

I wanted to give myself the Xmas gift of a 2026 pdf journal this year.

I had grand plans to create a fully fledged library for doing this, but ironically enough I just wasn’t that organised!

But courtesy or the comprehensive PDF api by dwarring https://raku.land/zef:dwarring/PDF::API6 and with help from tbrowder’s Date::Names for human dates https://raku.land/zef:tbrowder/Date::Names and, of course, all the folk who have brought raku/rakulang to us, here’s my script

First, let’s import all the libraries we need – you’ll need to use zef to install what you don’t have:

use PDF::API6;
use PDF::Content::Color :rgb;
use PDF::Annot::Link;
use PDF::Destination :Fit;
use PDF::Content::FontObj;
use Date::Names;

We’re going to do a lot of hardcoding towards A4, and just work for 2026 in this case so developing this further will certainly involve more variables! But create a pdf, set it to ‘A4’ and create our start date which has to be the first of the fist due to all the counting I’m doing on my hands and toes

my $pdf = PDF::API6.new();
$pdf.media-box = 'A4';
my $start_date = Date.new(2026, 1, 1);
my PDF::Page $page;
my PDF::Content::FontObj $font = $pdf.core-font('Helvetica-Bold');

We’re also going to add an index to the pdf, so let’s set up an array for that and a reusable title variable

my $bookmarks = [];
my $bookmark-title = '';

We’re going to create a pdf with all the pages we need. At first I was going to create page content as I went, but for linking to other pages, the page must exist.

So here we create a big, blank pdf that has 12 pages at the front for the months of the year and 365 pages following for every day of the year. And by using days-in-year we’ll be ready when we update to do years involving leap years.

for 1..(12 + $start_date.days-in-year) {
    $page = $pdf.add-page;
};

This is a handy sub taken from the very full documentation for PDF::Api which will make our links tidier, and you’ll see used in the following loop

sub dest(|c) { :Dest($pdf.destination(|c)) }

Let’s start filling the pages!

We’ll do the first 12 pages will are our months of 2026. Use the handy raku loop shortcut of creating our $month variable. And we know our months are pages 1-12 of our blank pdf.

We want a variable we can change that starts as the start date, but we don’t want to change the actual start date. And also we’ll be using nice date names as we’ll be writing dates on our PDF for real people – me!

It’s worth mentioning that the co-ordinates on pdfs are (x, y) – with x being how far from the left and y being how far from the bottom you are plotting.

So (0,0) on a pdf is the bottom left.

my $current_date = $start_date;
my $nice_date_name = Date::Names.new;

for 1..12 ->$month  {
    $page = $pdf.page($month); # Access the first 12 pages in turn
    my $text = ($nice_date_name.mon($current_date.month)); # Set text ie 'January', 'February'...
    $bookmark-title = $nice_date_name.mon($current_date.month); # Also remember our document outline
    $bookmarks.push( %(( :Title($bookmark-title)), dest(:page($page))));


    $page.gfx.text: { 
        .font = $font, 20; # We set our font earlier as a variable, but hardcoding the size here
        .text-position = 250, 800; # And we're targeting the top middle of the A4 pdf page
        .say($text); # Boom! We've written on our page, but not done yet...
    }

So, one page at a time we’re writing the month at the top of the pdf page – all in memory so far. But it might be nice to decorate a bit further.

We’ll continue the loop by putting in the day number of the month, with the day and a helpful horizontal line for writing notes on.

But we’re going to make that link to the page for that particular day too, so we can make overview notes here, but jump to the full page for more detailed notes.

(It should also be noted that I tried this in the kindle app on Android and the links didn’t work, but they worked fine on Samsung notes and my pdf app on my PC – think this is actually a limitation of kindle rather than me.)

my $height = 750; # So, this is nearish the top in 'points' on an A4 pdf
for 1..$current_date.days-in-month ->$day { # days-in-month - built into raku and very handy!
    my $text = $day ~ " (" ~ $nice_date_name.dow($current_date.day-of-week, 3) ~ ")";
    # So our text will look like '1 (Mon)', '2 (Tue)' or whatever…
    $page.gfx.text: {
    .font = $font, 20; # We're creating text as before
    # We're going to draw a line under this text, so a little positional adjustment
    .text-position = 20, $height + 2;
    # This text is going to be a link rather than plain text!
    my PDF::Annot::Link $link = $pdf.annotation(
    :border-style({:width(0)}),
    :page($page),
    :text($text),
    # Using our sub for the intricacies, let's make the target the main page for this date
    # But adjusting it by 12 because our first 12 pages in the pdf are the months of the year.
    # And this is why we created the whole pdf at the beginning - linking to a page that doesn't
    # exist will break.
    |dest(:page( $current_date.day-of-year + 12 )),
    );
};

So we’ve done two types of text – plain text and linking text – now lets draw our lines under the date we just wrote.

    $page.gfx.paint: :fill, :stroke, { # Painting rather than texting
        .StrokeColor = rgb(.211, .211, .211);
        .LineWidth = 2.5;
        .current-point = (20, $height); # Lines go from somewhere...
        .LineTo(575, $height); # ...to somewhere
    };

    $height = $height - 24; # We want our next date and the line to be lower
    $current_date++; # Add we're in a loop for this month so let's go to the next day
};
};

So, we’ve now gone through our first 12 pages, putting the month name at the top and writing all the days vertically down the page, with a horizontal line for writing notes on. All very profesh!

Next we want to go through the daily pages of the journal. For these we’re just going to create a blank page with the day’s date on the top.

Obviously, there’s scope to go through the line drawing again, or perhaps look into drawing a border with a rectangle? Or maybe print a random quotation from somewhere on it, or use a pdf template you’ve drawn in some other app and just want to print the days date on.

We’re doing this for the joy of being able to personalise our digital stationary – you can of course spend not very much money buying something very nice off Etsy, but it’s nice to have so many options.

$current_date = $start_date; # Let's remember to reset to the first day of the year

for 13..($start_date.days-in-year + 12) ->$day  { # Adjust for our 12 monthly pages at the beginning
    $page = $pdf.page($day);
    my $text = (
          $nice_date_name.dow($current_date.day-of-week)
        ~ ' ' ~
        $current_date.day
        ~ ' ' ~
        $nice_date_name.mon($current_date.month)
    ); # So, 'Monday 12 January', 'Friday 29 May' or whatever

    # Remembering we're doing our outline...
    $bookmark-title = $nice_date_name.dow($current_date.day-of-week) ~ ' ' ~ $current_date.day ~ ' ' ~ $nice_date_name.mon($current_date.month);
    $bookmarks.push( %(( :Title($bookmark-title)), dest(:page($page))));
    
    $page.gfx.text: {
        .font = $font, 20;
        .text-position = 200, 800;
        .text-position = 200, 800;
        my PDF::Annot::Link $link = $pdf.annotation(
            :border-style({:width(0)}),
            :page($page),
            :text($text),
            # So we're creating a link as before and we want to link to one of the first 12 pages
            # that corresponds to the month we're in
            |dest(:page($current_date.month)),
        );
    };

    $current_date++; # Move onto the next day
};

Then the last things we need to do are save our outline to the pdf and save our pdf for wrapping up and putting under the digital tree.

$pdf.outlines.kids = $bookmarks;
$pdf.save-as($start_date.year ~ '.pdf');

And we are ready to be organised next year with a PDF journal that has handy linking and lots of opportunities for improvement. Perhaps it might be nice to add more pages into the daily part – todo, to sketch, notes. Or perhaps another section that contains all the months on one page so you can jump from there to the individual months?

But there’s plenty of scope for making it more flexible to accept a year and a page size at the very least, but for now – I have what I need.

Day 3 – Christmas Crunching Part I

“Advent is here”, the buzz was all around, the elves were getting nervy and the reindeer pawed the ground.

Rudolph (for it was he) stood in quiet contemplation as the elves increased their pace and the din grew ever louder. As we every other Christmas, he was wondering how to get the job done – checking and rechecking all the flight system and navigation data.

He cracked open his laptop – Linux, of course, for that is the leading nordic OS – and opened a Command Line Terminal session. Being a Raku fan of old, he had caught wind of a new feature in the App::Crag model -> inbuilt support for LLM::DWIM (that famous LLM CLI module by the awesome Brian Duggan) and his hooves started to clack away on the keys.

Light Speed

The first challenge was to work out the total distance to travel on Christmas Day and then to know what speed Santa’s slight would need to average in order to get around the entire Earth in just 24 hours.

Click fullscreen to enlarge text…

Wowee – 3.5% of the speed of light (c), eh?

Also – Rudolph was quite impressed with the new built in App::Crag LLM features … boy those Raku guys knew how to jump on a bandwagon. A command line calculator that can source the value of just about anything right there, convert to units of measurement for dimensional analysis and assign to temporary variables so that unit math is comprehensible and that you can backtrack and amend any mistakes or changes along the way. No need to look up planetary stats, physics constants … or even do a text LLM query for advice on formulae.

[Kids – do not even think about using App::Crag to cheat on your end of term exams]

Lorentz Contraction

And that made him wonder about the Lorentz contraction, would they still be ablt to fir all the gifts into the sleigh? [editor note: Rudolph is genius level for a reindeer, ok!]

Amazing – the sleigh would only contract by 7.4214mm – just a sliver and space for gifts aplenty.

Rudolph nodded sagely and lit his MeerSchaum, it would be alright on the night after all.

Rudolph’s calm and cozy,
no rush, no need to roam—
he’s happily puffing his meerschaum pipe
by the stables’ frosty dome.

Find out if Christmas can go on after all in the next thrilling instalment…

~librasteve


Credits

Some of the App::Crag features in play tonight were:

  • ?<some random LLM query>
  • ^<25 mph> – a standard crag unit
  • ?^<speed of a diving swallow in mph> – put them together to get units
  • 25km – a shortcut if you have simple SI prefixes and units
  • $answer = 42s – crag is just vanilla Raku with no strict applied

Checkout the crag-of-the-day for more – but beware, this is kinda strangely addictive.

Day 2 – Doing Data Science with Raku

Introduction

This document provides an overview of Raku packages, as well as related documents and presentations, for doing Data Science using Raku.

This simple mnemonic can be utilized for what Data Science (DS) is while this document is being read:

Data Science = Programming + Statistics + Curiosity

Remark: By definition, anytime we deal with data we do Statistics.

We are mostly interested in DS workflows — the Raku facilitation of using
Large Language Models (LLMs) is seen here as:

  • An (excellent) supplement to standard, non-LLM DS workflows facilitation
  • A device to use — and solve — Unsupervised Machine Learning (ML) tasks

(And because of our strong curiosity drive, we are completely not shy using LLMs to do DS.)

What is needed to do Data Science?

Here is a wordier and almost technical definition of DS:

Data Science is the process of exploring and summarizing data, uncovering hidden patterns, building predictive models, and creating clear visualizations to reveal insights. It is analytical work analysts, researchers, or scientists would do over raw data in order to understand it and utilize those insights.

Remark: “Utilize insights” would mean “machine learning” to many.

This is the general workflow (or loop) for doing DS:

Assume you have a general purpose language which is very good at dealing with text and a package ecosystem with a well maintained part dedicated to doing various Web development tasks and workflows. (I.e. trying to re-live Perl’s glory days.) What new components the ecosystem of that programming language has to be endowed with in order to make it useful for doing DS?

The list below gives such components. They are ranked by importance (most important first), but all are important — i.e. each is “un-skippable” or indispensable.

  • Data import and export
  • Data wrangling facilitation
  • Statistics for data exploration
  • Machine Learning algorithms (both unsupervised and supervised)
  • Data visualization facilitation
  • Interactive computing environment(s)
  • Literate programming

Additional, really nice to have, but not indispensable components are:

  • Data generation and retrieval
  • Interfaces to other programming languages and ecosystems
  • Interactive interfaces to parameterized workflows (i.e. dashboards)
  • LLM utilization facilitation

Just an overview of packages

This document contains overlapping lists of Raku packages that are used for
performing various types of workflows in DS and related utilization of LLMs.

Remark: The original version of this document, written a year ago, had mostly the purpose of proclaiming (and showing off) Raku’s tooling for DS, ML, and LLMs.

At least half a dozen packages for performing ML or data wrangling in Raku have not been included for three reasons:

  1. Those packages cannot be installed.
    • Mostly, because of external (third party) dependencies.
  2. When tried or experimented with, the packages do not provide faithful or complete results.
    • I.e. precision and recall are not good.
  3. The functionalities in those packages are two awkward to use in computational workflows.
    • It is understandable to have ecosystem packages with incomplete or narrow development state.
    • But many of those packages are permanently in those states.
    • Additionally, the authors have not shown or documented how the functionalities are used in longer computational chains or real-world use cases.

The examples given below are only for illustration purposes, and by no mean exhaustive. We refer to related blog posts, videos, and package READMEs for more details.

Remark: The packages mentioned in this document can be installed with the script “raku-data-science-install.sh”.

How to read it?

There are three ways to read this document:

  • Just look (or maybe, download) the mind map in the next section.
    • And the section “Machine Learning & Statistics“.
  • Just browse or read the summary list in the next section and skim over the rest of the sections.
  • Read all sections and read or browse the linked articles and notebooks.

Actually, it is assumed that many readers would read one random section of this document, hence, most of the sections are mostly self-contained.


Summary of Data Science components and status in Raku

The list below summarizes how Raku covers the Data Science (DS) components listed above. Each component-item has sub-items for its “previous” state (pre-2021), current state (2025), essential-or-not mark, current state 1-to-5 star rating, and references. There are also corresponding table and mind-map.

Remark: Current-state star ratings are, of course, subjective. But I do compare Raku’s DS ecosystem with those of Python, R, and Wolfram Language, and try to be intellectually honest about it.

Remark: Mind-map’s PDF file has “life” hyperlinks.


Code generation

For a few years I used Raku to “only” make parser-interpreters for Data Science (DS) and Machine Learning (ML) workflows specified with natural language commands. This is the “Raku for prediction” or “cloths have no emperor” approach; see [AA2]. At some point I decided that Raku has to have its own, useful DS and ML packages. (This document proclaims the consequences of that decision.)

Consider the problem:

Develop conversational agents for Machine Learning workflows that generate correct and executable code using natural language specifications.

The problem is simplified with the observation that the most frequently used ML workflows are in the ML subdomains of:

  • Classification
  • Latent Semantic Analysis,
  • Regression
  • Recommendations

In the broader field of DS we also add Data Wrangling.

Each of these ML or DS sub-fields has it own Domain Specific Language (DSL).

There is a set of Raku packages that facilitate the creation of DS workflows in other programming languages. (Julia, Python, R, Wolfram Language.)

The grammar-based ones have the “DSL::” prefix — see, for example, “DSL::English::*” at raku.land.

The LLM based packages are “ML::NLPTemplateEngine” and “DSL::Examples”.

Examples

Here is an example of using the Command Line Interface (CLI) script of “ML::NLPTemplateEngine”:

concretize --l=Python make a quantile regression pipeline over dfTemperature using 24 knots an interpolation order 2

# qrObj = (Regressionizer(dfTemperature)
# .echo_data_summary()
# .quantile_regression(knots = 24, probs = [{0.25, 0.5, 0.75}], order = 2)
# .plot(date_plot = False)
# .errors_plot(relative_errors = False, date_plot = False))


Data wrangling

Most data scientists spend most of their time doing data acquisition and data wrangling. Not Data Science, or AI, or whatever “really learned” work. (For a more elaborated rant, see “Introduction to data wrangling with Raku”, [AA2].)

Data wrangling, summarization, and generation is done with the packages:

Example datasets retrieval is done with the package:

Generation of data wrangling workflows code is done with the package:

the functionalities of which are summarized in this diagram:

Examples

Data wrangling with “Data::Reshapers”:

use Data::Reshapers;
my @dsTitanic = get-titanic-dataset();
cross-tabulate(@dsTitanic, <passengerSex>, <passengerSurvival>)

# {female => {died => 127, survived => 339}, male => {died => 682, survived => 161}}

Data wrangling code generation via CLI:

dsl-translation -l=Raku "use @dsTitanic; group by passengerSex; show the counts"

# $obj = @dsTitanic ;
# $obj = group-by($obj, "passengerSex") ;
# say "counts: ", $obj>>.elems


Exploratory Data Analysis

At this point Raku is fully equipped to do Exploratory Data Analysis (EDA) over small to moderate size datasets. (E.g. less than 100,000 rows.) See [AA4, AAv5].

Here are EDA stages and related Raku packages:


Machine Learning & Statistics

The topics of Machine Learning (ML) and Statistics are too big to be described with more than an outline in this document. The curious or studious readers can check or read and re-run the notebooks [AAn2, AAn3, AAn4].

Here are Raku packages for doing ML and Statistics:

Remark: Again, mind-map’s PDF file has “life” hyperlinks.


Recommender systems and sparse matrices

I make Recommender Systems (RS) often during Exploratory Data Analysis (EDA). For me, RS are “first order regression.” I also specialize in the making of RS. I prefer using RS based on Sparse Linear Algebra (SMA) because of the fast computations, easy interpretation, and reuse in other Data Science (DS) or Machine Learning (ML) workflows. I call RS based on SMA Sparse Matrix Recommenders (SMRs) and I have implemented SMR packages in Python, R, Raku, and Wolfram Language (WL) (aka Mathematica.)

Remark: The main reason I did not publish the original version of this document a year ago is because Raku did not have SMA and SMR packages.

Remark: The making of LLM-based RS is supported in Raku via Retrieval Augment Generation (RAG); see “Raku RAG demo”, [AAv9].

I implemented a Raku recommender without SMA, “ML::StreamsBlendingRecommender”, but it is too slow for “serious” datasets. Still useful; see [AAv1].

SMA is a “must have” for many computational workflows. Since I really like having matrices (sparse or not) with named rows and columns and I have implemented packages for sparse matrices with named rows and columns in Python, Raku, and WL.

Remark: Having data frames and matrices with named rows and columns is central feature of R. Since I find that really useful from both DS-analytical and software-engineering-architectural points of view I made corresponding implementations in other programming languages.

After implementing the SMA package “Math::SparseMatrix” I implemented (with some delay) the SMR package, “ML::SparseMatrixRecommender”. (The latter one is a very recent addition to Raku’s ecosystem, just in time for this document’s publication.)

Examples

Here is an example of using Raku to generate code for one of my SMR packages:

dsl-translation -t=Python "
create from dsData;
apply LSI functions IDF, None, Cosine;
recommend by profile for passengerSex:male, and passengerClass:1st;"

# obj = (SparseMatrixRecommender()
# .create_from_wide_form(data = dsData)
# .apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine")
# .recommend_by_profile( profile = ["passengerSex:male", "passengerClass:1st"]))


Literate programming

“Literate Programming (LP)” tooling is very important for doing Data Science (DS). At this point Raku has four LP solutions (three of them are “notebook solutions”):

The Jupyter Raku-kernel packages “Jupyter::Kernel” and “Jupyter::Chatbook” provide cells for rendering the output of LaTeX, HTML, Markdown, or Mermaid-JS code or specifications; see [AAv2].

The package “Text::CodeProcessing” can be used to “weave” (or “execute”) computational documents that are Markdown-, Org-mode-, or Pod6 files; see [AAv2].

“RakuMode” is a Wolfram Language (WL) paclet for using Raku in WL notebooks.
(See the next section for the “opposite way” — using WL in Raku sessions.)

Remark: WL is also known as “Mathematica”.

The package “Markdown::Grammar” can be used in notebook conversion workflows; see [AA1, AAv1].

Remark: This document itself is a “computational document” — it has executable Raku and Shell code cells. The published version of this document was obtained by “executing it” with the command:

file-code-chunks-eval Raku-for-Data-Science-and-LLMs.md


Interconnections

A nice complement to the Raku’s DS and LLM functionalities is the ability to easily connect to other computational systems like Python, R, or Wolfram Language (WL).

The package “Proc::ZMQed” allows the connection to Python, R, and WL via ZeroMQ; see [AAv3].

The package “WWW::WolframAlpha” can be used to get query answers from WolframAlpha (W|A). Raku chatbooks have also magic cells for accessing W|A; see [AA3].


Cross language workflows

The packages listed in this document, along with the related articles and videos,
support and demonstrate computational workflows that work across different programming languages.

  • Data wrangling workflows code generation is for Julia, Python, R, Raku, SQL, and Wolfram Language (WL).
  • Raku’s data wrangling functionalities adhere the DSLs and workflows of the popular Python “pandas” and R “tidyverse”.
  • More generally, ML workflows code generators as a rule target R, Python, and WL.
    • At this point, only recommender systems Raku-code is generated.
  • The Raku DSL for interacting with LLMs is also implemented in Python and WL; see [AAv8].
    • To be clear, WL’s design of LLM functions was copied (or transferred) to Raku.

References

Articles, blog posts

[AA1] Anton Antonov, “Connecting Mathematica and Raku”, (2021), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Introduction to data wrangling with Raku”, (2021), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “Notebook transformations”, (2024), RakuForPrediction at WordPress.

[AA4] Anton Antonov, “Omni-slurping with LLMing”, (2024), RakuForPrediction at WordPress.

[AA5] Anton Antonov, “Chatbook New Magic Cells”, (2024), RakuForPrediction at WordPress.

[AA6] Anton Antonov, “Age at creation for programming languages stats”, (2024), RakuForPrediction at WordPress.

Notebooks

[AAn1] Anton Antonov, “Connecting Raku with Wolfram Language and Mathematica”, (2021), Wolfram Community.

[AAn2] Anton Antonov, “Data science over small movie dataset — Part 1”, (2025), RakuForPrediction-blog at GitHub.

[AAn3] Anton Antonov, “Data science over small movie dataset — Part 1”, (2025), RakuForPrediction-blog at GitHub.

[AAn4] Anton Antonov, “Data science over small movie dataset — Part 3”, (2025), RakuForPrediction-blog at GitHub.

Videos

[AAv1] Anton Antonov, “Markdown to Mathematica converter (Jupyter notebook example)”, (2022), YouTube/AAA4prediction.

[AAv2] Anton Antonov, “Conversion and evaluation of Raku files”, (2022), YouTube/AAA4prediction.

[AAv3] Anton Antonov, “Using Wolfram Engine in Raku sessions”, (2022), YouTube/AAA4prediction.

[AAv4] Anton Antonov, “LLaMA models running guide (Raku)”, (2024), YouTube/AAA4prediction.

[AAv5] Anton Antonov, “Conversion and evaluation of Raku files”, (2024), YouTube/AAA4prediction.

[AAv6] Anton Antonov, “Raku Literate Programming via command line pipelines”, (2024), YouTube/AAA4prediction.

[AAv7] Anton Antonov, “Exploratory Data Analysis with Raku”, (2024), YouTube/AAA4prediction.

[AAv8] Anton Antonov, “Geographics data in Raku demo”, (2024), YouTube/AAA4prediction.

[AAv9] Anton Antonov, “Raku RAG demo”, (2024), YouTube/AAA4prediction.

[AAv10] Anton Antonov, “Robust LLM pipelines (Mathematica, Python, Raku)”, (2024), YouTube/AAA4prediction.

[AAv11] Anton Antonov, “TRC 2022 Implementation of ML algorithms in Raku”, (2022), YouTube/antononcube.

Day 1 – Dancer, Dasher and Dosh (LLM-powered shell commands)

Dancer, Dasher and the other reindeer work overtime on Christmas Eve delivering billions of gifts.

Each year the DevOps elves try and make things flow a bit smoother. The team use dosh (Do-Shell) – a Raku-powered command-line utility for turning natural language into platform-friendly shell commands.

Instead of remembering all those pesky command-line utilities and arguments, the DevOps team use dosh like this:

dosh asks your LLM of choice what to run — and returns a single, shell command with an explanation and warning if needed. It won’t execute the command without a human/elf confirming first.

Behind the scenes, dosh delegates its magic to the super-simple LLM::DWIM module and your $LLM of choice. dosh inserts the current operating system and architecture into the prompt for context. Use dosh prompt to see the current version of the prompt (v9):

You are a senior ubuntu shell engineer on linux 6.14.0-36-generic (x86_64).

Translate a natural-language request into ONE safe shell command for execution on the linux operating system.

RESPONSE FORMAT (STRICT):
Return MINIFIED JSON on a single line, with EXACT keys:
{"shell_command":"...","explanation":"...","warning":""}

RULES:
- shell_command: a single-line shell command that fully addresses the request.
- Prefer read-only substitutes (e.g., 'du -sh * | sort -h | tail -n 20') when user intent is unclear.
- NEVER include sudo unless essential; avoid destructive flags by default.
- NEVER access external services or APIs; use only local system commands instead.
- NEVER suggest a command that contains an http:// or https:// URL.
- explanation: a brief, friendly description of what the command does. 
- warning: "" if read-only; otherwise 1 short sentence describing the risk.
- Output ONLY the minified JSON. No prose. No code fences. No backticks.

Examples:
{"shell_command":"ls -la","explanation":"Lists files with details in the current directory.","warning":""}
{"shell_command":"find . -type f -size +100M -print0 | xargs -0 ls -lh","explanation":"Shows paths and sizes of files larger than 100 MB.","warning":""}
{"shell_command":"find . -type f -name '*.bak' -delete","explanation":"Deletes all .bak files under the current directory.","warning":"This permanently removes files."}

The shell_command should solve the following command_request:

$your-request-goes-here

One of the junior Elves, who likes science fiction, was glad that a human/elf is always in the loop. Her testing showed:

You can install dosh from raku.land with zef:

shell> zef install dosh
shell> dosh help

Happy Christmas!

The 2025 Raku Advent Posts

(in chronological order, with comment references)