# Advent of Code, in Erlang: Day 21

Published Tuesday, December 21, 2021 by Bryan

I don't often rewrite everything between Part 1 and Part 2 of the Advent of Code puzzles. But I sure did for Day 21.

Part 1 started off with a simple game simulation. It's a bit more state than usual to keep track of, but the loop/recursion should be familiar to people who have done the earlier puzzles.

I went a little overboard in making nicer datastructures and breaking each piece of the game into separate functions. I could tell that the number of things to keep track of was right in that spot where I could manage it with "just" a handful of integers, but I'd be happier to have things bundled with names. So, there is a player record p and a die record d, and a play_game loop that calls out to a player_roll function that does three die_rolls for a player, and then moves the player. State in, state out - simple.

Part 1 was eerily low-difficulty, compared to other puzzles this late in the calendar. I guessed that Part 2 would have something to do with predicting which player would win, but I didn't see quantum dice coming.

I went down a few incorrect paths in figuring out how to deal with the quantum dice. It sounded like a probability problem at first, and I did not enjoy probability in college. Maybe it is a probability problem after all, and those that did enjoy it were able to solve without a second simulation. I got to work simulating, hoping that I'd recognize some pattern that would help me recall the right formula.

That started with hand-simulating one player's paths to victory. My thinking at this point was that I needed a table listing the relative likelihoods that a player, starting on a given space, won in N moves. While my text notes above may look tedious, this actually went pretty quickly, and convinced me that the search space wasn't large.

But then I looked back at my Part 1 implementation and remembered that one move consists of three rolls. My hand simulation was for only one roll. Three rolls would have twenty-seven outcomes! … or would it?

After momentary panic at the thought of a branching factor of 27, I saw the twist I was looking for. The puzzle describes a player's turn as three rolls of a die. For the purposes of counting universes, three rolls with three potential values is 27 outcomes. But for the purposes of counting score, rolls of something like 1,1,3 produce exactly the same result as rolls of 1,3,1 and 3,1,1! In fact, there are only seven unique movement results. A branching factor of seven isn't small, but it's better than twenty-seven. If games go similar to my hand simulation of one die roll, I'm looking at only five or six rounds to a win. That makes only 117,649 (=76) outcomes!

So, I set about simulating. At this point, I was still thinking probability, so I was looking for the number of games won after N rounds. The ALL_ROLL_COUNTS constant is a list of possible roll value (sum of three dice), and how many ways that roll can come up. The function below it produces a map from number of game rounds, to the number of universes that took that many rounds to win. The implementation is yet another depth-first traversal. If we find a win, we don't have to look deeper. If we're not at a win yet, we consider all roll values from this point. The key to counting universes is the RollMult * LeafMult. Whenever we consider a roll value, we have to consider that there are RollMult universes that can take that can roll that value (the second number in ALL_ROLL_COUNTS), and that there are LeafMult universes that might have arrived at this choice.

There are three interesting things here:

1. I was wrong about the depth of the tree. It's not a maximum of six moves to win. It's 10.
2. These functions are fast anyway! I'm measuring about 6ms on my laptop.
3. The universe count in these results is at least six orders of magnitude too low!

That last point sent me spinning for a bit. The example in the puzzle description says that player 1 wins in 444 trillion universes, but my results only count only a few hundren million. I tried adding the results up in different ways for a bit (i.e. maybe it's the sum of values for rolls 3-4, plus 3-5, plus 3-6, whenever the other player has fewer wins for that value). But adding a few millions is not how one reaches trillions. What was I missing?

Thinking through the N-choose-M of it all, I got the answer. Player 1 only wins after 4 rounds if Player 2 didn't win in 3. Or, to reword it, in order for one of Player 1's 4-round universes to win, we can't be in one of Player 2's 3-win universes. The number of ways a player can win in three rounds isn't independent of the other player. We have to interleave the player-universe choices.

The independent calculation isn't worthless. Knowing the maximum number of rounds to a win tells us the maximum depth of the tree: 10+10 = 20. *cough* Woah 720 = almost 8x1016! That's even larger than the 4x1014 the example says we should get, and we haven't even multiplied by the independent die orderings yet!

Luckily, 20 is only the maximum depth of the tree. Looking at the independent win counts, most branches should max out at depth 7+7=14 (a mere 678billion), and many are even less than that. So the actual number of leaves we need to reach is much smaller than a full depth-20 tree.

This loop has similar structure to the earlier ones. I felt like handling the Rest of the current universe's recursion made the subtree recursion less obvious, so I've restructured to fold across the current universe. I also didn't like the explicit handling of player 1 and player 2 in the simple simulation, so I changed them to a list that swaps back and forth depending on who's turn it is (who is "Up"). The accumulator is a 2-tuple, where the first number is player 1's win count, and the second is player 2's win count.

Hooray! Those are the numbers we should expect. This function takes about seven seconds to calculate 800 trillion universes on my laptop. Not bad. … but can we do better? Though I've written this whole series in Erlang, I haven't yet used one tool people rave about: easy parallelization.

No, I'm not going to go the Erlang-demo route, and spin up one process per universe. I'm just going to take one step down the tree, and spawn seven processes - one for each of the first seven universe-groups. Each of those is a dirac_game_worker that starts the same tree-exploring loop we used before, but with P1 already having moved once. When a worker finishes looking at its subtree, it sends its result back to the original spawning process. That process collects and sums all of the responses together.

On my dual-core laptop, parallelization nearly halves the time - from just under 7sec, to just over 3.5sec. That's about as good as you can hope for.

But can we do better still? When I was facing another huge result space on Day 14, I used "memoization" to remember results of intermediate tree nodes that I had already calculated. At first, I didn't think that would work here, because I didn't see how subuniverses would ever repeat each other. But, just as we don't care whether the roll was 1,1,3 or 3,1,1, we don't care if players got to their current position and score in 3 moves or 7. The state of the game - each player's position and score - is independent of history.

This implementation flips the universe-count tracking upside down. Instead of passing down how many universes got us to a result, we pass back up how many universes from this point in the game lead to wins for each player. Apologies for the scrunch toward the right. The important note is that the AccWins are now only wins "from here", not total wins seen so far in the whole tree, and an AccMemo is now passed along, holding recordings of game states we've already seen and computed.

No, I didn't mis-paste. Memoization cuts the runtime to 66ms - a 100x improvement. A little back-of-the-envelope math helps it make sense. We said our typical tree depth would produce something like 678 billion leaves. If each player has a potential 0..21 points, and can be in any of 10 positions, that's only 48,400 (=22*22*10*10) potential game states. We could see each state 10 million times! The details of the actual tree shape, and the overhead of using the memoization table, reduce our expected returns, but it's no surprise that the speedup is big.

I'm not going to attempt to parallelize this for even more gain. For one, a 2x speedup after a 100x speedup just doesn't feel as awesome. For another, I'm not sure I'd see a 2x speedup this time. Memoization works because later computations can be skipped if we did them earlier. If I'm computing in parallel, I either have to duplicate work to make those findings in each thread, or share findings between threads. Either of those will decrease the returns on parallelization. For an even larger dataset, it could be worth it, but not here.

In any case, lesson learned, again: it's faster to reduce duplicate work than to do it in parallel.

Whew, I got to avoid dredging up too much of my rusty probability course after all. Give me a shout on Twitter (@hobbyist) if you did use those methods. I've decided to not clean up today's code. I think it's good for folks to see that we all get confused, take wrong turns, double-back, and make mistakes in general. A polished end result does not indicate a clean path to it.