or How (not) to pound your production server
(and to bring on the wrath of the Ops)
So, I’m a programmer and I work for a government TI “e-gov” department. My work here is mostly comprised of one-off data-integration tasks (like the one in this chronicle) and programming satellite utilities for our Citizen Relationship Management system.
So, suppose you have:
- a lot (half a million) records in a
.csvfile, to be entered in your database;
- a database only accessible via a not-controlled-by-you API;
- said API takes a little bit more than half a second per record;
- some consistency checks must be done before sending the records to the API; but
- the API is a “black box” and it may be more strict than your basic consistency checks;
- tight schedule (obviously)
So, taking half a second per record just in the HTTP round-trip is bad, very bad (34 hours for the processing of the whole dataset).
Let’s try to make things move faster…
So, explaining the code above a little bit:
@r is a lazy sequence (this means, roughly, that the
while my $row bit in
read-csv is executed one row at a time, in a coroutine-like fashion.) When I use
.hyper(:$degree, :$batch), it transforms the sequence in a “hyper-sequence”, basically opening a thread pool with
$degree threads and sending to each thread
$batch itens from the original sequence, until its end.
HTTP::UserAgent does not parallelise very nicely (it just does not work)… besides, why the
react whenever supply emit? It’s a mystery lost to the time. Was it really needed? Probably not, but the clock is always ticking, so just move along.
Cro::HTTP to the rescue
Nice, but I ran the thing on a testing database and… oh, no… lots of
503s and eventually a
401 and the connection was lost.
Oh, it ran almost to the end of the data (and it’s fast), but… we are getting some
409s for some records where our
csv-to-json is not smart enough, we can ignore those records. And some timeouts.
So, now the whole process goes smoothly and finishes in 20 minutes, circa 100x faster.
Import the data in production… similar results. The process is ongoing, 15 minutes in, the Ops comes (in person):
Why is the server load triple the normal and the number of
5xxis thru the roof?
just five more minutes, check the ticket XXX, closing it now…
And this is the story of how to import half a million records, that would take two whole days to be imported, in twentysome minutes. The whole ticket took less than a day’s work, start to finish.
If you want to read more about Raku concurrency, past Advent articles that might interest you are:
One thought on “Day 7: .hyper and Cro”