Advent of Code, in Erlang: Day 8

Published Saturday, December 18, 2021 by Bryan

Looking at memes in the various Advent of Code 2021 threads, there is one day that seems to have more complaints than any other: Day 8, the scrambled seven-segment LED.

The description starts with a long explanation about what seven-segment displays are, then asks you to perform a very simple task: count the number of white-space separated strings that have 2, 3, 4, or 7 characters. Those map to the 1, 7, 4, and 8 digits, respectively.

count_1478_outputs(Displays) ->
    lists:sum([ length([ O || O <- Output, is_1478(O) ])
                || {_, Output} <- Displays ]).

is_1478(O) ->
    case size(O) of
        2 -> true; % 1
        4 -> true; % 4
        3 -> true; % 7
        7 -> true; % 8
        _ -> false
    end.

This almost feels like the Day 1 puzzle - more about making sure you can parse the file, than anything.

Then you unlock Part 2, in which the requirement is to decode all of the other strings into their proper digit. And the scramble is different on every line of the file. And finally - and I think this is the one that has caused so much grumbling - the method for unscrambling is not explained. Ever read in a book or paper, "This is left as an exercise to the reader"? Here it's, "Through a little deduction, you should now be able to determine the remaining digits."

While technically each problem has a, "You figure out how to do it," aspect to it, this one does feel different. I think it's the scrambing being different on each line that does it. We no longer need a program that can do some simple substitution that we figured out. We need one that can figure out the substitutions on its own.

I think the people that will enjoy this puzzle are the ones who enjoyed the old puzzles where you were supposed to do things like match people to their houses, given only clues like, "Alyssa lives next to the green house", and "The person with the dog lives in the blue house." What you have to do here is determine what new pieces of information you can deduce from things you already know. And this is best done with pencil and paper, or a whiteboard, or at least somewhere initially separate from code.

For example, if you take only what we know so far, that a 1 digit uses two segments, and a seven uses two segments, and add to that the fact that 1 uses two of the segments that seven uses, you can deduce that the third segment in the seven, whatever it is, is the "a" segment.

  1:      7:
 ....    aaaa
.    c  .    c
.    c  .    c
 ....    ....
.    f  .    f
.    f  .    f
 ....    ....
find_one(Inputs) -> hd([_]=find_inputs_of_length(Inputs, 2)).
find_seven(Inputs) -> hd([_]=find_inputs_of_length(Inputs, 3)).
find_inputs_of_length(Inputs, Length) ->
    [ Code || Code <- Inputs, length(Code) == Length ].

One = find_one(Inputs),
Seven = find_seven(Inputs),
[A] = Seven -- One,

Each of my Inputs is a string of letters, as in the file. I converted them from binaries to another of Erlang's string formats: a list of characters. The -- operator subtracts the elements of its right-hand list argument from its left-hand list argument. Subtracting the characters in the list One from the characters in the list Seven leaves one character, that must be A in the given scramble.

My deduction code continues in this way:

  6:      1:
 aaaa    ....
b    .  .    c
b    .  .    c
 dddd    ....
e    f  .    f
e    f  .    f
 gggg    ....
find_six(Inputs, One) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 6,
                     length(One -- Code) == 1]).

Six = find_six(Inputs, One),
[C] = One -- Six,
[F] = One -- [C],

Six is the six-character code that, when subtracted from the two-character One leaves only 1 segment, and that segment must be C. Once we know C, the other segment in One is F.

  4:      9:      0:
 ....    aaaa    aaaa
b    c  b    c  b    c
b    c  b    c  b    c
 dddd    dddd    ....
.    f  .    f  e    f
.    f  .    f  e    f
 ....    gggg    gggg
find_four(Inputs) -> hd([_]=find_inputs_of_length(Inputs, 4)).
find_nine(Inputs, Four) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 6,
                     length(Four -- Code) == 0 ]).

find_zero(Inputs, Six, Nine) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 6,
                     Code =/= Six,
                     Code =/= Nine ]).

Four = find_four(Inputs),
Nine = find_nine(Inputs, Four),
Zero = find_zero(Inputs, Six, Nine),
[D] = Four -- Zero,
[B] = Four -- [C, F, D],

We already know Four is the 4-segment code. Nine is the six-segment code that, when subtracted from Four leaves no segments remaining. Once we know Nine and Six, Zero is the only six-character code left. Subtracting Zero from Four leave us with segment D. The only unknown segment left in Four is B.

  8:      9:
 aaaa    aaaa
b    c  b    c
b    c  b    c
 dddd    dddd
e    f  .    f
e    f  .    f
 gggg    gggg
find_eight(Inputs) -> hd([_]=find_inputs_of_length(Inputs, 7)).

Eight = find_eight(Inputs),
[E] = Eight -- Nine,
[G] = Eight -- [A, B, C, D, E, F],

Eight is the only 7-segment code. Subtracting Nine from Eight leaves us with the code for E. The only segment left is G.

  2:      3:      5:
 aaaa    aaaa    aaaa
.    c  .    c  b    .
.    c  .    c  b    .
 dddd    dddd    dddd
e    .  .    f  .    f
e    .  .    f  .    f
 gggg    gggg    gggg
find_two(Inputs, B, F) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 5,
                     length(Code -- [B,F]) == 5 ]).

find_three(Inputs, B, E) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 5,
                     length(Code -- [B,E]) == 5 ]).

find_five(Inputs, C, E) ->
    hd([_]=[ Code || Code <- Inputs,
                     length(Code) == 5,
                     length(Code -- [C,E]) == 5 ]).

Two = find_two(Inputs, B, F),
Three = find_three(Inputs, B, E),
Five = find_five(Inputs, C, E),

There are only three unknown digit codes left, but now that we known which segments are which, they clear up pretty easily. They're all five segment codes, and we can tell them apart by which segments are missing. Two doesn't use B or F, so the only five-digit code that doesn't decrease in size when we subtract them is the one we're looking for. The same trick works for Three, which doesn't use B or E, and for Five, which doesn't use C or E.

decode_output(Output, Segments) ->
    lists:foldl(fun(D, Acc) ->
                        N = decode_digit(D, Segments),
                        N + Acc * 10
                end,
                0,
                Output).

decode_digit(Zero, [Zero |_]) -> 0;
decode_digit(One, [_, One |_]) ->  1;
decode_digit(Two, [_, _, Two |_]) -> 2;
decode_digit(Three, [_, _, _, Three |_]) -> 3;
decode_digit(Four, [_, _, _, _, Four |_]) -> 4;
decode_digit(Five, [_, _, _, _, _, Five |_]) -> 5;
decode_digit(Six, [_, _, _, _, _, _, Six |_]) -> 6;
decode_digit(Seven, [_, _, _, _, _, _, _, Seven |_]) -> 7;
decode_digit(Eight, [_, _, _, _, _, _, _, _, Eight |_]) -> 8;
decode_digit(Nine, [_, _, _, _, _, _, _, _, _, Nine]) -> 9.

Take the decoded values we just produced, and put them in a list in numerical order: [Zero, One, Two, Three ... Nine]. Pass that list along with the list of Output strings from the end of the line to this decode_output function, and it will translate them to their correct value, returning the integer that would have been displayed, if the "wiring" were correct.

Well, with one other change: sort the segment characters in each of the Output and Segment strings first. That way "cgeb" matches "gcbe" (they'll both be "bceg"). It's the same segments, but the file is sort of double scrambled.

read_display({RawInputs, RawOutputs}) ->
    Inputs = [lists:sort(binary_to_list(I)) || I <- RawInputs],
    {Segments, _} = map_segments(Inputs),
    Outputs = [lists:sort(binary_to_list(O)) || O <- RawOutputs],
    decode_output(Outputs, Segments).

{Inputs, Outputs} = puzzle08:parse_display(<<"be cfbegad cbdgef fgaecd cgeb fdcge agebfd fecdb fabcd edb | fdgacbe cefdb cefbgd gcbe">>).
8394 = puzzle08:read_display({Inputs, Outputs}).

The full code for this solution is in my github repo. What other solutions did people come up with? Did someone try an expert system, applying rules until the system settled? Any brute force methods?