I don't often rewrite everything between Part 1 and Part 2 of the Advent of Code puzzles. But I sure did for Day 21.

Part 1 started off with a simple game simulation. It's a bit more state than usual to keep track of, but the loop/recursion should be familiar to people who have done the earlier puzzles.

I went a little overboard in making nicer datastructures and
breaking each piece of the game into separate functions. I could
tell that the number of things to keep track of was right in that
spot where I *could* manage it with "just" a handful of
integers, but I'd be happier to have things bundled with names. So,
there is a player record `p` and a die record `d`, and
a `play_game` loop that calls out to a `player_roll`
function that does three `die_rolls` for a player, and then
moves the player. State in, state out - simple.

Part 1 was eerily low-difficulty, compared to other puzzles this
late in the calendar. I guessed that Part 2 would have something to
do with predicting which player would win, but I didn't
see *quantum dice* coming.

I went down a few incorrect paths in figuring out how to deal with the quantum dice. It sounded like a probability problem at first, and I did not enjoy probability in college. Maybe it is a probability problem after all, and those that did enjoy it were able to solve without a second simulation. I got to work simulating, hoping that I'd recognize some pattern that would help me recall the right formula.

That started with hand-simulating one player's paths to victory. My thinking at this point was that I needed a table listing the relative likelihoods that a player, starting on a given space, won in N moves. While my text notes above may look tedious, this actually went pretty quickly, and convinced me that the search space wasn't large.

But then I looked back at my Part 1 implementation and remembered
that one move consists of three rolls. My hand simulation was for
only one roll. Three rolls would have *twenty-seven* outcomes!
… or would it?

After momentary panic at the thought of a branching factor of 27, I
saw the twist I was looking for. The puzzle describes a player's
turn as three rolls of a die. For the purposes of counting
universes, three rolls with three potential values is 27
outcomes. But for the purposes of counting score, rolls of something
like 1,1,3 produce exactly the same result as rolls of 1,3,1 and
3,1,1! In fact, there are only seven unique movement results. A
branching factor of seven isn't small, but it's better
than *twenty*-seven. If games go similar to my hand simulation
of one die roll, I'm looking at only five or six rounds to a
win. That makes only 117,649 (=7^{6}) outcomes!

So, I set about simulating. At this point, I was still thinking
probability, so I was looking for the number of games won after N
rounds. The `ALL_ROLL_COUNTS` constant is a list of possible
roll value (sum of three dice), and how many ways that roll can come
up. The function below it produces a map from number of game rounds,
to the number of universes that took that many rounds to win. The
implementation is yet another depth-first traversal. If we find a
win, we don't have to look deeper. If we're not at a win yet, we
consider all roll values from this point. The key to counting
universes is the `RollMult * LeafMult`. Whenever we consider
a roll value, we have to consider that there are `RollMult`
universes that can take that can roll that value (the second number
in `ALL_ROLL_COUNTS`), and that there are `LeafMult`
universes that might have arrived at this choice.

There are three interesting things here:

- I was wrong about the depth of the tree. It's not a maximum of six moves to win. It's 10.
- These functions are fast anyway! I'm measuring about 6ms on my laptop.
- The universe count in these results is at least six orders of magnitude too low!

That last point sent me spinning for a bit. The example in the
puzzle description says that player 1 wins in 444 trillion
universes, but my results only count only a few hundren million. I
tried adding
the results up in different ways for a bit (i.e. maybe it's the
sum of values for rolls 3-4, plus 3-5, plus 3-6, whenever the other
player has fewer wins for that value). But *adding* a few
millions is not how one reaches trillions. What was I missing?

Thinking through the N-choose-M of it all, I got the answer. Player 1 only wins after 4 rounds if Player 2 didn't win in 3. Or, to reword it, in order for one of Player 1's 4-round universes to win, we can't be in one of Player 2's 3-win universes. The number of ways a player can win in three rounds isn't independent of the other player. We have to interleave the player-universe choices.

The independent calculation isn't worthless. Knowing the maximum
number of rounds to a win tells us the maximum depth of the tree:
10+10 = 20. **cough** Woah 7^{20} = almost
8x10^{16}! That's even larger than the 4x10^{14} the
example says we should get, and we haven't even multiplied by the
independent die orderings yet!

Luckily, 20 is only the *maximum* depth of the tree. Looking at
the independent win counts, most branches should max out at depth
7+7=14 (a mere 678billion), and many are even less than that. So the
actual number of leaves we need to reach is much smaller than a full
depth-20 tree.

This loop has similar structure to the earlier ones. I felt like
handling the `Rest` of the current universe's recursion made
the subtree recursion less obvious, so I've restructured
to `fold` across the current universe. I also didn't like the
explicit handling of player 1 and player 2 in the simple simulation,
so I changed them to a list that swaps back and forth depending on
who's turn it is (who is "`Up`"). The accumulator is a
2-tuple, where the first number is player 1's win count, and the
second is player 2's win count.

Hooray! Those are the numbers we should expect. This function takes about seven seconds to calculate 800 trillion universes on my laptop. Not bad. … but can we do better? Though I've written this whole series in Erlang, I haven't yet used one tool people rave about: easy parallelization.

No, I'm not going to go the Erlang-demo route, and spin up one
process per universe. I'm just going to take one step down the tree,
and spawn seven processes - one for each of the first seven
universe-groups. Each of those is a `dirac_game_worker` that
starts the same tree-exploring loop we used before, but with P1
already having moved once. When a worker finishes looking at its
subtree, it sends its result back to the original spawning
process. That process collects and sums all of the responses
together.

On my dual-core laptop, parallelization nearly halves the time - from just under 7sec, to just over 3.5sec. That's about as good as you can hope for.

But can we do better still? When I was facing another huge result space on Day 14, I used "memoization" to remember results of intermediate tree nodes that I had already calculated. At first, I didn't think that would work here, because I didn't see how subuniverses would ever repeat each other. But, just as we don't care whether the roll was 1,1,3 or 3,1,1, we don't care if players got to their current position and score in 3 moves or 7. The state of the game - each player's position and score - is independent of history.

This implementation flips the universe-count tracking upside
down. Instead of passing down how many universes got us to a result,
we pass back up how many universes *from this point in the
game* lead to wins for each player. Apologies for the scrunch
toward the right. The important note is that the `AccWins`
are now only wins "from here", not total wins seen so far in the
whole tree, and an `AccMemo` is now passed along, holding
recordings of game states we've already seen and computed.

No, I didn't mis-paste. Memoization cuts the runtime to 66ms - a 100x improvement. A little back-of-the-envelope math helps it make sense. We said our typical tree depth would produce something like 678 billion leaves. If each player has a potential 0..21 points, and can be in any of 10 positions, that's only 48,400 (=22*22*10*10) potential game states. We could see each state 10 million times! The details of the actual tree shape, and the overhead of using the memoization table, reduce our expected returns, but it's no surprise that the speedup is big.

I'm not going to attempt to parallelize this for even more gain. For one, a 2x speedup after a 100x speedup just doesn't feel as awesome. For another, I'm not sure I'd see a 2x speedup this time. Memoization works because later computations can be skipped if we did them earlier. If I'm computing in parallel, I either have to duplicate work to make those findings in each thread, or share findings between threads. Either of those will decrease the returns on parallelization. For an even larger dataset, it could be worth it, but not here.

In any case, lesson learned, again: it's faster to reduce duplicate work than to do it in parallel.

Whew, I got to avoid dredging up too much of my rusty probability course after all. Give me a shout on Twitter (@hobbyist) if you did use those methods. I've decided to not clean up today's code. I think it's good for folks to see that we all get confused, take wrong turns, double-back, and make mistakes in general. A polished end result does not indicate a clean path to it.

Categories: Development AdventOfCode

Post Copyright © 2021 Bryan Fink