Advent of Code, in Erlang: Day 14

Published Tuesday, December 14, 2021 by Bryan

Okay, Day 14 of Advent of Code 2021 was fun. I had a good time solving it last night, and had a good time again cleaning up that code to share this morning.

Fun part number one was that I decided to use a "new" Erlang feature: maps. You may have noticed a pattern in my code of using lists full of pairs, where the first member of the pair is treated as a "key", and the second member is treated as a "value". That's because I stopped writing Erlang full-time eight years ago - two years before maps made it into a release. I love learning new things, and I think this Advent of Code format is perfect for it.

parse_input(Data) ->
    {[Template], [<<>>|RuleLines]} =
        lists:splitwith(fun(L) -> L =/= <<>> end,
                        string:split(Data, <<"\n">>, all)),
     maps:from_list([ {[PA,PB],I} || <<PA,PB," -> ",I>> <- RuleLines ])}.

Parsing the input is much like yesterday: the file starts with one line format, there is a separator, and then a different format follows. I've converted the first part, the polymer template, into a list, because that will be easier to add the insertions to later. The second part, the rules for insertion, are where I've started using maps. Yes, it looks like I'm doing the old list-of-pairs thing again, but that's just the easiest way to construct a map with a bunch of predefined values. The keys in my map are the two-letter polymer pattern to match, and each value is the element to insert at that point.

polymerize(0, _, Polymer) ->
polymerize(Count, Rules, Polymer) ->
    polymerize(Count-1, Rules, polymerize_step(Rules, Polymer, [])).

polymerize_step(_, [Last], Acc) ->
polymerize_step(Rules, [A,B|Template], Acc) ->
    #{[A,B] := I} = Rules,
    polymerize_step(Rules, [B|Template], [I,A|Acc]).

For the first half of Part 1's solution, I wrote a quick recursion to walk Count times through the Polymer, and insert the correct element (I) at each position. The first real new-maps usage is visible in the second clause of polymerize_step/3. That format is #{Key := Value} = Map, and it extracts the correct insertion rule for this point in the polymer. This match format would also raise an exception if there was no rule for this pair of elements, so since I've completed the puzzle, I know there was no funny business in the input.

Tiny side note on a topic I've already mentioned too often: this is a great example of when it's more appropriate to write out the recursion manually, rather than use one of the fold or map functions. This recursion cares about both the head of the list, and the element after it. The functions don't allow looking past the head.

When polymerize/3 returns, it gives me a new list, in the same format as the passed-in Polymer, that has Count rounds of polymerization applied. To answer Part 1 and claim the star, I need to find the most common and least common element, which means I need to get counting.

count_elements(Polymer) ->
    lists:foldl(fun add_count/2, #{}, Polymer).

add_count(Element, Map) ->
    maps:update_with(Element, fun(V) -> V + 1 end, 1, Map).

Wow. When I wrote the first version of this last night, I hadn't yet learned about maps:update_with/4. Updating a map, based on a value stored in the map, I have to say, looks pretty noisy syntax-wise without this function. That's probably in comparison to the amount of work done to produce the value, which here is very small, but I'm still amazed at how well these pieces composed.

What these two functions do is produce a map of element to the number of times that element appears in the polymer. The count_elements function folds over the Polymer, and uses add_count to increment the value for each Element in the accumulated Map. To finally answer Part 1, I just need to find the entry with the largest value, and the entry with the smallest value in this map.

{ExTemplate, ExRules} = puzzle14:parse_input(Input).
Polymer = puzzle14:polymerize(10, ExRules, ExTemplate).
CountMap = puzzle14:count_elements(Polymer).
SortedCounts = lists:keysort(2, maps:to_list(CountMap)).
[{_,Min}|_] = SortedCounts.
{_,Max} = lists:last(SortedCounts).
1588 = Max-Min.

I converted the map back to a list for easy sorting. There is a maps:fold function that I could have used to iterate through the map and find the min/max, but this works for data of this size, so I won't bother with designing more state tracking.

Part 2 is also where fun part number 2 starts. I mentioned in my first AdventOfCode post that there are no bonus points for efficiency. However, this is the second puzzle I've found where some amount of efficiency is necessary if you hope to answer the puzzle at all. Instead of the twist being something like interpretting one or more of the rules differently, as has been the case in a few puzzles, the change in Part 2 is to run the polymerization 40 times instead of 10.

The code as I've written it, will happily run the polymerization as many times as you want … until it runs out of memory, or you get tired of waiting. The example description given notes that after 40 passes, there should be 2,192,039,569,602 "B" elements. That's nearly two terabytes of B alone, before counting space for other elements! So, building the whole polymer before counting was not going to be an option.

The puzzle description is very explicit about, "inserted elements are not considered to be part of a pair until the next step", which I think is a little bit of a misdirect. At least for me, it made me think I needed to do whole passes before starting over. This is not true!

One key to efficiently solving this puzzle, is that, when you find "AB" in a polymer template, and the rule says to insert "I", you don't have to write down "AIB", and then go process "B" and whatever comes after it before coming back. You can instead immediately find the rule for what to insert between "AI" and the rule for "IB", and do those, as long as you remember that those would be part of the next pass.

Taken a step further, this can be modeled as a depth-first traversal, like in Day 12. After finding "AB -> I", find "AI -> X", and then "AX -> Y", and then "AY -> Z", until you're reached a depth matching the number of passes you're supposed to make. Then back up and take the right-hand branches: "YZ -> ?", "XY -> ?", "IX -> ?", etc. going down those paths and back up, all the way back to "IB -> ?", which will require another full tree descent. At each stage, increment the count for the element inserted, and then add the counts for the elements in the original polymer template, and you have your count without ever storing the full final polymer in memory at one time …

… except. Memory is only one of the problems. Walking up and down to over two trillion leaves also takes time! I wrote this version, just in case my estimate of time was wrong. Modern CPUs can do a surprising amount in one second. Alas, they can't do that much. But, re-running the 10-pass example, and the 10-pass Part 1 real input, at least proved that the count would come out right.

How do we get around the time pressure? Remember when I said that one bit of code would raise an exception if there weren't a rule for some pair of letters? The fact that it didn't means that we know there is a finite (and relatively small!) number of combinations that will ever occur in this "tree". In the example, there are only 16 combinations (only 100 in the real input). If we run the polymerization on each of those combinations, and record the count of elements that it produces at each step, we don't have to run the simulation on the polymer template at all. We can simply look at our records, and say that "AB" produces so many "I", "X", "Y", etc. elements after 40 steps. If we record each combination, by each step count, that's only 16 * 40 = 640 count maps (100 * 40 = 4000 for the real input) to remember …

… but. Even running 40 passes on a single starting pair is a huge amount of work. So, I took this idea one step farther, using a technique called "memoization". If we do some quick math, one pass turns one pair into two pairs, two pairs into four, four pairs into eight, etc. One more step, and we're already at sixteen pairs - but there are only sixteen unique pairs in the rules! Likely before this, but for sure after this, we'll be considering the same two-element combination multiple times at each level. There is no reason to re-compute its element production count. Instead, we'll build the element-step map as we do the traversal, so that every node in the tree can first check if an identical node was considered. If a "memoized" result exists, we can just use that instead of recomputing it.

init_memo(Rules) ->
    maps:from_list([ {{A,B,0}, #{}} || [A,B] <- maps:keys(Rules) ]).

polymerize_count(Steps, Rules, Polymer) ->
    polymerize_count(Steps, Rules, Polymer, init_memo(Rules), #{}).

polymerize_count(Steps, Rules, [A,B|Polymer], Memo, Count) ->
    NewMemo = polymerize_memo(Steps, Rules, [A,B], Memo),
    #{{A,B,Steps} := AddCount} = NewMemo,
    NewCount = maps:merge_with(fun(_, AC, BC) -> AC + BC end,
                               AddCount, Count),
    polymerize_count(Steps, Rules, [B|Polymer], NewMemo,
                     add_count(A, NewCount));
polymerize_count(_, _, [Last], _, Count) ->
    add_count(Last, Count).

polymerize_memo(0, _, _, Memo) ->
polymerize_memo(Steps, Rules, [A,B], Memo) ->
    case Memo of
        #{{A,B,Steps} := _} ->
        _ ->
           #{[A,B] := I} = Rules,
           NewMemo =
                polymerize_memo(Steps-1, Rules, [A,I],
                                polymerize_memo(Steps-1, Rules, [I,B], Memo)),
            #{{A,I,Steps-1} := AICount,
              {I,B,Steps-1} := IBCount} = NewMemo,
            NewCount = maps:merge_with(fun(_, AC, BC) -> AC + BC end,
                                       AICount, IBCount),
            NewMemo#{{A,B,Steps} => add_count(I, NewCount)}

So this is what my solution does. polymerize_count considers each pair of the template polymer, and asks polymerize_memo to compute the element count, while also building this memoization table. The neat thing is that the shallower levels directly benefit as well: their counts are the element-wise sums of their left and right branches - and many of those will already be memoized! The one optimization I made was to pre-populate the initial Memo with zero-step rules. This prevents constant reconstruction and editing of the memoization map when we reach leaves. It would have been even better to pre-populate with one-step rules. Next time.

CountMap40 = puzzle14:polymerize_count(40, ExRules, ExTemplate).
SortedCounts40 = lists:keysort(2, maps:to_list(CountMap40)).
[{_,Min40}|_] = SortedCounts40.
{_,Max40} = lists:last(SortedCounts40).
2188189693529 = Max40 - Min40.

That's the correct answer for the puzzle, but it doesn't do the efficiency justice. Here are a few timed runs comparing the original implementation and the new one:

timer:tc(fun() -> puzzle14:count_elements(puzzle14:polymerize(10, ExRules, ExTemplate)) end).
%% {974,#{66 => 1749,67 => 298,72 => 161,78 => 865}}
timer:tc(fun() -> puzzle14:polymerize_count(10, ExRules, ExTemplate) end).
%% {197,#{66 => 1749,67 => 298,72 => 161,78 => 865}}

timer:tc(fun() -> puzzle14:count_elements(puzzle14:polymerize(20, ExRules, ExTemplate)) end).
%% {714565,#{66 => 2009315,67 => 84643,72 => 47997,78 => 1003774}}
timer:tc(fun() -> puzzle14:polymerize_count(20, ExRules, ExTemplate) end).
%% {434,#{66 => 2009315,67 => 84643,72 => 47997,78 => 1003774}}

timer:tc(fun() -> puzzle14:count_elements(puzzle14:polymerize(21, ExRules, ExTemplate)) end).
%% {1497656, #{66 => 4036129,67 => 143254,72 => 93390,78 => 2018684}}
timer:tc(fun() -> puzzle14:polymerize_count(21, ExRules, ExTemplate) end).
%% {455,  #{66 => 4036129,67 => 143254,72 => 93390,78 => 2018684}}

timer:tc(fun() -> puzzle14:polymerize_count(40, ExRules, ExTemplate) end).
%% {1030, #{66 => 2192039569602,67 => 6597635301,72 => 3849876073, 78 => 1096047802353}}

The first number in the result of each example is the number of microseconds that executing the function took. At ten passes, we're looking at 974µs vs. 197µs. That's not a big difference for us waiting at the terminal, but then we look at 20 passes: 714milliseconds, vs. 0.4ms. The old implementation took nearly 1000 times as long to produce its result, while the new one took barely twice the time. But then, that's what we expected: ten more steps, produces 2^10 times as many pairs, which is about 1000x. Adding just one more step confirms the progression: 21 passes took the old implementation 1.5seconds, which is about double what 20 passes took. Meanwhile, the new implementation takes only an additional 21microseconds, which is probably near the noise threshold on my laptop. Back of the envelope math says 40 passes, would multiply the original implementation's time by 2^20 over its 20-pass time. That's about a million, so we'd be looking at 714,000seconds, or over 8days to finish … if I had enough memory to hold it all! I'll stick with the efficient implementation that makes 40 passes in about 1ms. Here are times for the full solutions, solveA doing 10 passes with the old implementation, and solveB doing 40 passes with the new one:

timer:tc(puzzle14, solveA, []).
% {5724,_NoSpoilers}
timer:tc(puzzle14, solveB, []).
% {8697,_NoSpoilers}

So that was fun. Getting used to Erlang's map took me some time. My original code used a lot more maps syntax, and a lot less maps module. The final code doesn't use the => assignment syntax at all. That syntax is probably more useful when you're only setting new values, and not updating existing ones. The full module is up on github, if you want to play with the syntax yourself.

Did anyone reading this write or see an even more efficient way to manage the large working set? Let me know on Twitter (@hobbyist). Good luck with Day 15!