Day 7: .hyper and Cro

or How (not) to pound your production server

(and to bring on the wrath of the Ops)

So, I’m a programmer and I work for a government TI “e-gov” department. My work here is mostly comprised of one-off data-integration tasks (like the one in this chronicle) and programming satellite utilities for our Citizen Relationship Management system.

the problem

So, suppose you have:

a lot (half a million) records in a .csv file, to be entered in your database;
a database only accessible via a not-controlled-by-you API;
said API takes a little bit more than half a second per record;
some consistency checks must be done before sending the records to the API; but
the API is a “black box” and it may be more strict than your basic consistency checks;
tight schedule (obviously)

the solution

the prototype: `Text::CSV` and `HTTP::UserAgent`

So, taking half a second per record just in the HTTP round-trip is bad, very bad (34 hours for the processing of the whole dataset).

	sub read-csv(IO() $file) {
	gather {
	my $f = $file.open: :r :!chomp;
	with Text::CSV.new {
	.header: $f, munge-column-names => { S:g /\W+//.samemark('x').lc };
	while my $row = .getline-hr: $f { take $row }
	}
	}
	}

	sub csv-to-yaml(@line –> Str)
	# secret sauce
	my %obj = do { … };
	to-yaml %obj
	}

	sub server-put($_) {
	# HTTP::UserAgent
	}

	sub MAIN(Str $input) {
	my @r = lazy read-csv $input;
	server-login;
	server-put csv-to-json $_ for @r
	}

view raw read-csv-by-massa.raku hosted with ❤ by GitHub

`.hyper`ize it

Let’s try to make things move faster…

	sub MAIN(Str $input) {
	my @r = lazy read-csv $input;
	server-login;
	react {
	whenever supply {
	.emit for @r.hyper(:8degree, :16batch)
	.map(&csv-to-yaml)
	} {
	server-post $_
	}
	}
	}

view raw hyperized-csv-reading-by-massa.raku hosted with ❤ by GitHub

So, explaining the code above a little bit: @r is a lazy sequence (this means, roughly, that the while my $row bit in read-csv is executed one row at a time, in a coroutine-like fashion.) When I use .hyper(:$degree, :$batch), it transforms the sequence in a “hyper-sequence”, basically opening a thread pool with $degree threads and sending to each thread $batch itens from the original sequence, until its end.

Yeah, but HTTP::UserAgent does not parallelise very nicely (it just does not work)… besides, why the react whenever supply emit? It’s a mystery lost to the time. Was it really needed? Probably not, but the clock is always ticking, so just move along.

`Cro::HTTP` to the rescue

	sub server-login() {
	my Lock \l .= new;
	our $cro;
	l.protect: {
	my $c = Cro::HTTP::Client.new:
	base-uri => SERVER-URL,
	content-type => JSON,
	user-agent => 'honking/2022.2.1',
	timeout => %(
	connection => 240,
	headers => 480,
	),
	cookie-jar => Cro::HTTP::Client::CookieJar.new,
	;
	await $c.post: "{SERVER-URI}/{SESSION-PATH}", body => CREDENTIALS
	$cro = $c
	}
	$cro
	}

	sub server-post($data) {
	our $cro;
	my $r = await $cro.post: "{SERVER-URI}/{DATA-PATH}", body => $data;
	await $r.body
	}

view raw server-login-post-by-massa.raku hosted with ❤ by GitHub

Nice, but I ran the thing on a testing database and… oh, no… lots of 503s and eventually a 401 and the connection was lost.

	constant NUMBER-OF-RETRIES = 3; # YMMV
	constant COOLING-OFF-PERIOD = 2; # this is plenty to stall this thread

	sub server-post($data) {
	our $cro;
	do {
	my $r = await $cro.post: "{SERVER-URI}/{DATA-PATH}", body => $data;
	my $count = NUMBER-OF-RETRIES;
	while $count– and $r.status == 503\|401 {
	sleep COOLING-OFF-PERIOD;
	server-login if $r.status == 401;
	$r = await $cro.post: "{SERVER-URI}/{DATA-PATH}", body => $data;
	}
	await $r.body
	}
	}

view raw server-post-v2-by-massa.raku hosted with ❤ by GitHub

Oh, it ran almost to the end of the data (and it’s fast), but… we are getting some 409s for some records where our csv-to-json is not smart enough, we can ignore those records. And some timeouts.

	sub format-error(X::Cro::HTTP::Error $_) {
	my $status-line = .response.Str.lines.first;
	my $resp-body = do { await .response.body-blob }.decode;
	my $req-method = .request.method;
	my $req-target = .request.target;
	my $req-body = do { await .request.body-blob }.decode;
	"ERROR $status-line WITH $resp-body FOR $req-method $req-target WITH $req-body"
	}

	sub server-post($data) {
	our $cro;
	do {
	my $r = await $cro.post: "{SERVER-URI}/{DATA-PATH}", body => $data;
	my $count = NUMBER-OF-RETRIES;
	while $count– and $r.status == 503\|401 {
	sleep COOLING-OFF-PERIOD;
	server-login if $r.status == 401;
	$r = await $cro.post: "{SERVER-URI}/{DATA-PATH}", body => $data;
	}
	await $r.body
	}

	CATCH {
	when X::Cro::HTTP::Client::Timeout {
	note 'got a timeout, cooling off for a little bit more';
	sleep 5 * COOLING-OFF-PERIOD;
	server-login
	}
	when X::Cro::HTTP::Error {
	note format-error $_
	}
	}
	}

view raw server-post-v3-by-massa.raku hosted with ❤ by GitHub

the result

So, now the whole process goes smoothly and finishes in 20 minutes, circa 100x faster.

Import the data in production… similar results. The process is ongoing, 15 minutes in, the Ops comes (in person):

Why is the server load triple the normal and the number of 5xx is thru the roof?

just five more minutes, check the ticket XXX, closing it now…

(unintelligible noises)

And this is the story of how to import half a million records, that would take two whole days to be imported, in twentysome minutes. The whole ticket took less than a day’s work, start to finish.

Day 7: .hyper and Cro

or How (not) to pound your production server

the problem

the solution

the prototype: `Text::CSV` and `HTTP::UserAgent`

`.hyper`ize it

`Cro::HTTP` to the rescue

the result

related readings

Published by Massa 👽 Humberto

One thought on “Day 7: .hyper and Cro”

Leave a comment Cancel reply

or How (not) to pound your production server

the problem

the solution

the prototype: Text::CSV and HTTP::UserAgent

.hyperize it

Cro::HTTP to the rescue

the result

related readings

Share this:

Related

Published by Massa 👽 Humberto

One thought on “Day 7: .hyper and Cro”

Leave a comment Cancel reply

the prototype: `Text::CSV` and `HTTP::UserAgent`

`.hyper`ize it

`Cro::HTTP` to the rescue