Day 23 – Optimizing Raku programs with Zig

The wind blows snow from across the glowing windows of Santa’s front office, revealing a lone elf sitting in front of a computer. She looks despondent, head in hands, palms rubbing up against eyes, mouth yawning…

Tikka has been working double shifts to finish the new package address verification mechanism. There have been some unfortunate present mixups before that should not happen again.

Now, Tikka loves Raku. She chose it for this project and almost all of Santa’s user-facing systems are written in Raku, after all.

But Tikka is currently struggling with the speed of Raku runtime. No matter how she writes the code, the software just can’t keep up with the volume of packages coming off the workshop floors, all needing address verification.

Address handling in Santa’s workshop

Here is a flowchart of the design that Tikka is in the middle of finishing. All of the Floor Station and Package Scanner work has already been done.

Only the Address Verification component remains to be completed.

Raku-only implementation

Here is her current test implementation of the CRC32 processor in Raku:

#!/usr/bin/env raku
use v6.*;

unit sub MAIN(:$runs = 5, :$volume = 100, :$bad-packages = False);

use String::CRC32;
use NativeCall;

my $address = "01101011 Hyper Drive";
my $crc32 = String::CRC32::crc32($address);
class Package {
has Str $.address is rw = $address;
has uint32 $.crc32 = $crc32;
}

# Simulating the traffic from our eventual input, a partitioned Candycane™ queue
my $package-supplier = Supplier.new;
my $package-supply = $package-supplier.Supply;
# A dummy sink that ignores the data and prints the processing duration of
# the CRC32 stage
my $output-supplier = Supplier.new;
my $output-supply = $output-supplier.Supply;
# Any address that fails the CRC32 test goes through here
my $bad-address-supplier = Supplier.new;
my $bad-address-supply = $bad-address-supplier.Supply;
# A tick begins processing a new batch
my $tick-supplier = Supplier.new;
my $package-ticker = $tick-supplier.Supply;

my $time = now;
my $bad-address = $address.succ;

my @packages = $bad-packages
?? (Package.new xx ($volume - ($volume * 0.1)),
       Package.new(:address($bad-address)) xx ($volume * 0.1)
      ).flat
!! Package.new xx $volume;
note ">>> INIT: {now - $time}s ($volume objects)";

$package-ticker.act({
$package-supplier.emit(@packages);
});

$package-supply.act(-> @items {
my $time = now;
@items.map( -> $item {
if $item.crc32 != String::CRC32::crc32($item.address) {
$bad-address-supplier.emit($item);
}
});
$output-supplier.emit([@items, now - $time]);
});

my $count = 0;
my $bad-count = 0;
# Start the train (after waiting for the react block to spin up)
start { sleep 0.001; $tick-supplier.emit(True); }
react {
whenever $output-supply -> [@itmes, $duration] {
say "RUN {++$count}: {$duration}s";
if $count == $runs {
note "<<< $bad-count packages with bad addresses found. Alert the Elves!"
              if $bad-packages;
done();
}
$tick-supplier.emit(True);
}

whenever $bad-address-supply -> $item {
$bad-count++;
# ... send to remediation queue and alert the elves!
}
}

Discussion

The above code is a reasonable attempt at managing the complexity. It simulates input via $package-ticker, which periodically emits new batches of Package/Address pairs. When deployed, this could would receive continuous batches at intervals that are not guaranteed.

Which is a problem, because we are already unable to keep up with one second intervals by the time we reach 100,000 packages per second.

> raku crc-getter.raku --volume=10000 --runs=3
>>> INIT: 0.008991746s (10000 objects)
RUN 1: 0.146596686s
RUN 2: 0.138983732s
RUN 3: 0.142380065s

> raku crc-getter.raku --volume=100000 --runs=3
INIT: 0.062402473s (100000 objects)
RUN 1: 1.360029456s
RUN 2: 1.32534014s
RUN 3: 1.353072834s

This won’t work, because at peak the elves plan to finalize and wrap 1,000,000 gifts per second!

> raku crc-getter.raku --volume=1000000 --runs=3
>>> INIT: 0.95481884s (1000000 objects)
RUN 1: 13.475302627s
RUN 2: 13.161153845s
RUN 3: 13.293998956s

No wonder Tikka is stressing out! While it will be possible to parallelize the job across several workers, there is just no way that they can — or should — build out enough infrastructure to run this job strictly in Raku.

Optimization

Tikka frowns and breathes deeply through her nose. She’s already optimized by declaring the types of the class attributes, with $!crc32 being declared as the native type uint32 but it hasn’t made much of an impact — there’s no denying that there is too much memory usage.

This is the time that a certain kind of coder — a wizard of systems, a master bit manipulator, a … C coder — could break out their devilish syntax and deliver a faster computation than the Raku native String::CRC32. Yet fewer and fewer are schooled in the arcane arts, and Tikka is long-rusty.

But this is a popular algorithm! Tikka decides to start looking for C code that produces CRC32 to use with Raku’s NativeCall when all of a sudden she remembers hearing of a different possibility, a young language with no ABI of its own: it produces C ABI directly, instead.

A language that ships with a fairly extensive, if still a but churning, standard library that just might include a CRC32 directly.

Her daughter had mentioned it often, in fact.

“But what was it called again?”, Tikka wonders, “Iguana? No, that’s not right.. Ah! Zig!”.

Zig

The more Tikka reads, the more her jaw drops. She has to peel herself away from videos discussing the current capabilities and future aims of this project. Tikka’s daughter is not happy to be woken up by her mother’s subsequent phone call… until the topic of Zig is brought up.

“I told you that you should look into it, didn’t I?,” she says in the superior tones only teenagers are truly capable of, before turning helpful. “I’ll be right there to guide you through it. This should be simple but a few things might vex you!”

Creating a shared library

The first step (after installing Zig) is:

> cd project-dir
> zig init-lib
info: Created build.zig
info: Created src/main.zig
info: Next, try `zig build --help` or `zig build test` 

This has produced a nice starting point for our library.

Let’s take a look at main.zig:

const std = @import("std");
const testing = std.testing;

export fn add(a: i32, b: i32) i32 {
    return a + b;
}

test "basic add functionality" {
    try testing.expect(add(3, 7) == 10);
}

By default, the build.zig file is set to produce a static library, but we want a shared library. Also, the name will have been set to whatever the directory is named, but Tamala — Tikka’s daughter — suggests that they want it to be crc.

“Names matter,” Tamala says knowingly. Tikka can’t hide a knowing smile of her own as she changes the following in the build.zig, from:

    const lib = b.addStaticLibrary(.{
        .name = "sleigh-package-tracker",

into:

    const lib = b.addSharedLibrary(.{
        .name = "crc",

Initial test

Tikka writes a short Raku program to test the truth of Zig’s ABI compatibility. The signature is easy for her to deduce from the Zig syntax, where u32 means uint32.

use NativeCall;
use Test;
constant LIB = "./zig-out/lib/crc" # NativeCall will automatically prepend 'lib' and
# suffix with `.so`, `.dylib`, or `.dll` depending
# on platform.

sub add(uint32, uint32) returns uint32 is native(LIB) {*}
ok add(5, 5) == 10, "Zig is in the pipe, five-by-five";

She runs zig build and then runs her test:

> zig build
> raku add-test.raku
ok 1 - Zig is in the pipe, five-by-five

“You were expecting something different?,” Tamala asks, her right eyebrow arched, face grinning. “Not to say you shouldn’t find it impressive!”

Tikka laughs, moving back to make room at the keyboard. “Show me some code!”

Using std.hash.CRC32

Once finished double-checking the namespace in the Zig stdlib documentation, it takes Tamala less than a minute to code the solution and add a test in src/main.zig.

const std = @import("std");
const testing = std.testing;

export fn hash_crc32(string: [*:0]const u8) u32 {
    return std.hash.crc.Crc32.hash(std.mem.span(string));
}

test "basic functionality" {
    const test_string: [*:0]const u8 = "abcd";
    try testing.expect(hash_crc32(test_string) == 3984772369);
}

Then Tamala runs the tests.

> zig build test --summary all
Build Summary: 1/1 steps succeeded; 1/1 tests passed
test success
└─ run test 1 passed 140ms MaxRSS:2M
   └─ zig test Debug native success 3s MaxRSS:258M

Everything’s looking good! She runs the regular build stage, with Zig’s fastest optimization strategy, ReleaseFast. She’s young and quite confident that she won’t need a debug build for this work.

> zig build --summary all -Doptimize=ReleaseFast
Build Summary: 3/3 steps succeeded
install success
└─ install crc success
   └─ zig build-lib crc ReleaseFast native success 9s MaxRSS:335M

Tikka starts reaching for the keyboard but her daughter brushes her hand away. “I’ve got something even better for you than simply using a native CRC32 implementation,” Tamala says with a twinkle in her eye.

Generating objects in Zig

After about fifteen minutes, Tamala has modified main.zig to look like this:

const std = @import("std");
const testing = std.testing;
const Crc32 = std.hash.crc.Crc32;
const span = std.mem.span;

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();
var arena = std.heap.ArenaAllocator.init(allocator);
const aa = arena.allocator();

pub const Package = extern struct {
    crc32: u32,
    address: [*:0]const u8,
};

const package_pool = std.heap.MemoryPoolExtra(Package, .{ .growable = true });
var pool = package_pool.init(allocator);

export fn hash_crc32(string: [*:0]const u8) u32 {
    return Crc32.hash(std.mem.span(string)); 
}

export fn create_package(address: [*:0]const u8, crc32: u32) *Package {
    const package = pool.create() catch @panic("No more memory to allocate for packages!");
    const address_copy: [*:0]const u8 =  aa.dupeZ(u8, std.mem.span(address)) catch @panic("Could not create address string");
    package.* = .{
        .address = address_copy, 
        .crc32 = crc32
    };
    return package;
}

export fn teardown() void {
    arena.deinit();
    pool.deinit();
}

test "basic add functionality" {
    const test_string: [*:0]const u8 = "abcd"; 
    try testing.expect(hash_crc32(test_string) == 3984772369);

    const test_address: [*:0]const u8 = "222 Moon Roof Blvd";
    const test_crc32: u32 = hash_crc32(test_address);
    const test_package: *Package = create_package(test_address, test_crc32);
    defer teardown();

    try testing.expect(test_package.crc32 == test_crc32);
    try testing.expect(std.mem.eql(u8, span(test_package.address), span(test_address)));
}

“We’re going to create our Package objects in Zig?”, Tikka asks.

“We’re going to create our Package objects in Zig,” Tamala grins.

It only takes Tikka a few minutes to make the necessary changes to the Raku script, which now looks like:

#!/usr/bin/env raku

unit sub MAIN(:$runs = 5, :$volume = 100, :$bad-packages = False);

use String::CRC32;
use NativeCall;

constant LIB = "./zig-out/lib/crc";
sub hash_crc32(Str) returns uint32 is native(LIB) {*}
class Package is repr('CStruct') {
has uint32 $.crc32;
has Str $.address is rw;
}
sub create_package(Str, uint32) returns Package is native(LIB) {*}
sub teardown() is native(LIB) {*}

my $package-supplier = Supplier.new;
my $package-supply = $package-supplier.Supply;

my $output-supplier = Supplier.new;
my $output-supply = $output-supplier.Supply;

my $bad-address-supplier = Supplier.new;
my $bad-address-supply = $bad-address-supplier.Supply;

my $ticker-supplier = Supplier.new;
my $package-ticker = $ticker-supplier.Supply;

my $interrupted = False;
my $time = now;
my Str $test-address = "01101011 Hyper Drive";
my uint32 $test-crc32 = hash_crc32($test-address);
my Str $bad-address = $test-address.succ;

my @packages = $bad-packages
?? (create_package($test-address, $test-crc32) xx ($volume - ($volume * 0.1)),
      create_package($bad-address, $test-crc32) xx ($volume * 0.1)
     ).flat
!! create_package($test-address, $test-crc32) xx $volume;
note ">>> INIT: {now - $time}s ($volume objects)";
END teardown unless $interrupted;

$package-ticker.act({
$package-supplier.emit(@packages);
});

$package-supply.act(-> @items {
my $time = now;
@items.map( -> $item {
# Uncomment for testing with failure cases
if $item.crc32 != hash_crc32($item.address) {
$bad-address-supplier.emit($item);
}
});
$output-supplier.emit([@items, now - $time]);
});

my $count = 0;
my $bad-count = 0;
# Start the train (sleep for a fraction of a second so that react can spin up)
start { sleep 0.001; $ticker-supplier.emit(True); }
react {
whenever $output-supply -> [@itmes, $duration] {
say "Batch #{++$count}: {$duration}s";
if $count == $runs {
note "<<< $bad-count packages with bad addresses found. Alert the Elves!"
             if $bad-packages;
done;
} else {
$ticker-supplier.emit(True);
}
}

whenever $bad-address-supply -> $item {
$bad-count++;
# ... send to remediation queue and alert the elves!
}

whenever signal(SIGINT) {
teardown;
$interrupted = True;
if $bad-packages {
note "<<< $bad-count packages with bad addresses found. Alert the Elves!" if $bad-packages;
}
done;
}
}

As might be expected, the only changes required were adding the hash_crc32 and create_package declarations and then adjusting the package creation and CRC32 checking code.

Speed!

> raku crc-getter-extended.raku --volume=10000 --runs=3
>>> INIT: 0.233359034s (10000 objects)
Batch #1: 0.009997515s
Batch #2: 0.005876425s
Batch #3: 0.005673926s

> raku crc-getter-extended.raku --volume=100000 --runs=3
>>> INIT: 0.072163946s (100000 objects)
Batch #1: 0.082134337s
Batch #2: 0.049445661s
Batch #3: 0.049449161s

> raku crc-getter-extended.raku --volume=1000000 --runs=3
>>> INIT: 0.699643099s (1000000 objects)
Batch #1: 0.77157517s
Batch #2: 0.469638071s
Batch #3: 0.452177919s

Tikka can barely believe what she’s reading. In comparison to the pure Raku implementation, Tamala’s solution gets comparatively blazing performance when using Zig to create the Package objects.

There is a hit in memory consumption, which makes sense as there are now the objects themselves (allocated by Zig) as well as the Raku references to them. The increase is on the order of around 20% — not massive, but not insignificant either. At least it appears to scale at the some percentage. It might be worth investigating whether there is any inefficiency going on from the Raku side with regards to the memory usage of CStruct backed classes.

Tikka throws her arms around her daughter and says, “Well, I sure am glad I was listening to all your chirping about Zig! It looks like we have a workable solution for speeding up our Raku code to handle the peak load!”

Tamala squirms, trying helplessly to dodge her mother’s love attack.


Post Scriptum

Astute readers will have noticed the bad-packages parameter in both versions of the script. This parameter ensures that a percentage of the Package objects are created with an address that won’t match its associated CRC32 hash. This defeats some optimizations that happen when the data is entirely uniform.

For the sake of brevity, the bad-packages timings have been omitted from this short story. However, you can find them in a companion blog post where you can read even more about the journey of integrating Raku and Zig.

One thought on “Day 23 – Optimizing Raku programs with Zig

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.