QOI Compression in Erlang

Published Monday, December 6, 2021 by Bryan

I had a nagging urge to code recently. An urge not quite strong enough to get me into one of the larger projects I have sitting around. I just wanted a few hours, not a few days, of fiddling. In a moment of serendipity, I saw a few posts about someone experimenting with a new, simple image compression format they called the "Quite Okay Image (QOI)" format. Bit twiddling? Sounds perfect!

To spoil the ending, I've put my Erlang implementation of the format on Github. The format is still under development, so this implementation is just compatible with what their project had up at the time. I have a few fun further implementation ideas that I might play with, but for now, I have some thoughts on the format to share.

I like the streamability of the format. To help debug my implementation, I wrote a little tool that would encode a stream, while decoding the compressed stream in lockstep. The tool would stop as soon as the re-decoded stream failed to match the pre-encoded stream. Dumping the encoder and decoder states, next to offsets and surrounding bytes in the stream, made it easy to zero in on errors.

Streaming is also the main place that I may play with this format a little more. My current implementation holds both the compressed stream and the uncompressed stream fully in memory. This isn't necessary, and either or both could be read or written to/from disk, network, etc. as needed.

The format is not without its weaknesses, though. One of them is that, while it's pretty quick and simple, its compression is not as good as PNG. This was the image I had open in the background when I decided to start coding:

Super Mario World Map
I'm still working on finding all the exits.

On disk as a PNG, it's … well, it depends on which program wrote the PNG last. Mac Preview saved it as 227,248 bytes. The qoiconv utility from the QOI repo writes a 191,354-byte PNG after round-tripping. wxErlang writes a 131,594-byte PNG. Obviously, these are all using different options and implementations of compression utilities, and probably different header metadata as well. Decoded, they all produce the same 1,218,816 pixel values.

What size is the QOI file? Interestingly, this also depends on which program write the image last. The qoiconv utility produces a 315,423-byte file. My erlang implementation writes 314,086 bytes. I haven't compared byte-by-byte, but my quess would be that our implementations make a few different choices about whether to use a small/medium modification chunk or a medium/large modification chunk. In either case while they're within 1% of each other, they're both 38-139% larger than the equivalent PNGs.

What would it take to improve QOI's compression ratio? One thing that stands out to me is the index implementation. The debugging tool I wrote for my implementation printed the states of the encoder and decoder when it finished, and something I noticed is that nearly half of the index slots were unused.

wx:new().
WI = wxImage:new().
wxImage:loadFile(WI, "super-mario-world-map-original.png").
ImageData = wxImage:getData(WI).
Channels = 3.
{ok, Encoder, Decoder} = eqoi:verify(Channels, ImageData).
Index = element(4, Encoder).
64 = length(tuple_to_list(Index)).
35 = length([ P || P <- tuple_to_list(Index), is_binary(P) ]).

Only 35 of 64 index slots used. There are more than 35 colors in this image, right?

{_, Pixels} = lists:foldl(
    fun(B, {[G,R], Acc}) -> {[],    [<<R,G,B>>|Acc]};
       (X, {P,     Acc}) -> {[X|P], Acc}
    end,
    {[], []},
    binary_to_list(ImageData)).
UniquePixels = lists:usort(Pixels).
114 = length(UniquePixels).

One hundred fourteen unique pixel values, but only 35 ever stored together in the index. So some colors are kicking each other out of the index instead of using empty slots. Referencing an indexed pixel requires only one byte, but 3/4 of the other color encodings require two or more bytes. If we could get more colors available in the index at once, we'd probably reduce the size of the encoded image.

Which index slot a color uses is decided by QOI's hash function. The function is currently the bitwise exclusive OR of the four color components - red, green, blue, and alpha/opacity - modulo the size of the index, 64. BXOR isn't an unusual choice for combining a bunch of random bytes into a quick and dirty hash. But the size of the index has an unfortunate effect on the outcome here: modulo 64 is the same as bitwise AND 0x3F. Put another way, we've just discarded the randomness provided by the highest two bits of each pixel component.

ShortPixels = [ <<0:2, R:6, 0:2, G:6, 0:2, B:6>>
                || <<R, G, B>> <- UniquePixels ].
UniqueShortPixels = lists:usort(ShortPixels).
102 = length(UniqueShortPixels).

So before we have even started hashing, we've coerced 12 colors into occupying the same slots as other colors. If those upper bits could have pushed those colors into other index spots, we might have stored as many as 47 colors, over 30% better!

The size of the index is derived from the desire to be able to address it in the six bits of a byte left over after the two-byte "index reference" prefix. So, we can't just increase the modulo, even though increasing the index size could be nice for some images. Instead, we need to force those upper bits to have an effect on the hashed lower bits. An easy way to do that is to use a real hash function.

pixel_hash(<<R/integer, G/integer, B/integer>>) ->
    <<_:2, Hash:6, _/binary>> = crypto:hash(sha, <<R, G, B, 255>>),
    Hash.

For example, we could grab any six bits out of a SHA1 hash of our pixel value. Using the implementation of eqoi:pixel_hash/1 above, the index after encoding our image has these stats:

c(eqoi).
{ok, EncodeSHA, _} = eqoi:verify(3, ImageData).
50 = length([ P || P <- tuple_to_list(element(4, EncodeSHA)), is_binary(P) ]).
273598 = size(eqoi:encode(3, ImageData)).

Fifty colors made it into the index together this time. That's even better than our hoped-for 47! And the encoded size is down to 273,598 bytes. The file would inclue a 14-byte header and a 4-byte footer, for a total of 273,616 bytes. That's 40,470 bytes smaller than my original implementation - a 12.8% savings. Now it's only 20-108% larger than PNG!

I know that some won't want to introduce a cryptography library into otherwise straightforward bit-twiddling code, though. Can we push the bits into the hash without SHA? How about literally shifting the high bits down?

pixel_hash(<<R/integer, G/integer, B/integer>>) ->
     (R bxor ((R band 192) bsr 2) bxor
          G bxor ((G band 192) bsr 4) bxor
          B bxor ((B band 192) bsr 6) bxor
          255 bxor ((255 band 192) bsr 3))
     rem ?INDEX_SIZE.

This implementation applies the high bits of the red component at the upper end of the index; the green, to the middle; the blue, to the low end; and the alpha, somewhere in between. How does it do?

c(eqoi).
{ok, EncodeBSR, _} = eqoi:verify(3, ImageData).
53 = length([ P || P <- tuple_to_list(element(4, EncodeBSR)), is_binary(P) ]).
265627 = size(eqoi:encode(3, ImageData)).

Unbelievably it does better than SHA, fitting yet another three colors into the index. And it's another 7971 bytes smaller. This 265,645-byte on-disk image would be only 17-102% larger than the equivalent PNG!

One more before we get on with further analysis. I said before that we couldn't increase the size of the index, because we only had six bits to address it. We can decrease it, though. If we're not using all of the slots anyway, what if we computed modulo 63 instead of modulo 64?

pixel_hash(<<R/integer, G/integer, B/integer>>) ->
    (R bxor G bxor B bxor 255) rem (?INDEX_SIZE-1).

Since 63 isn't a power of two, its modulo isn't just a bitmask. Its result does depend on the values of the high two bits of each component.

c(eqoi).
{ok, Encode63, _} = eqoi:verify(3, ImageData).
46 = length([ P || P <- tuple_to_list(element(4, Encode63)), is_binary(P) ]).
263754 = size(eqoi:encode(3, ImageData)).

Not as many colors in the index as either SHA or our bit-shifting, but still only one shy of the bump we were hoping above the original implementation. But far more interesting is the fact that it uses fewer bytes anyway! A 1% shave off of our best isn't huge, but it's not nothing. A 263,772-byte file would only be just over 100% larger than our smallest PNG, and only 16% larger than the largest.

How could that be - fewer colors in the index, but a smaller encoding? Was the hypothesis at start of this experiment wrong? Is the size not dependent on index fullness? No. The actual use of the index will depend on how often indexes of color values collide. While spreading out across the index should make them collide less often, that's not guaranteed. Exactly which colors collide will caused some images to compress better, and some worse. Two colors whose hash values collide will constantly eject each other from the index in an image where those colors alternate across the scene, but will have little effect if one is never seen again after the other appears.

One of many attempts to find the best position for my Starlink satellite dish

To illustrate, here is a screenshot from my phone. It's of comparable size to the image we've been using, but a vastly different color space.

wx:new().
WI3 = wxImage:new().
wxImage:loadFile(WI3, "phone-screenshot.png").
Screenshot = wxImage:getData(WI3).
c(eqoi). %% back to the original hash implementation
{ok, EncodeScreenshot, _} = eqoi:verify(3, Screenshot).
64 = length([ P || P <- tuple_to_list(element(4, EncodeScreenshot)), is_binary(P) ]).

This time, even with the original hash function, all index slots are used. How is the size?

% stat -f "%z %N" phone-screenshot.*
1843036 phone-screenshot.png
1889626 phone-screenshot.qoi

Not bad! Only 2.5% larger than the PNG. How do the other hash implementations fair?

1889883 phone-screenshot-63.qoi
1889810 phone-screenshot-bsr.qoi
1887397 phone-screenshot-sha.qoi

All of them filled the index (modulo-63 filled 63 slots, instead of 64, of course). And, they all produce nearly the same size of file. SHA wins the size contest here, but this time 2KB savings is much less than 1%.

In this image, the number of unique pixel values is 193,401. Notice that the encoded image size is 7 times larger, despite the number of pixels being 7% smaller. With so many colors, the index will be in constant flux. Any hash function with an even spread is going to perform nearly the same. Colors not in the index will be the norm.

So what's the lesson here? Is QOI's hash function bad? That depends on what you're encoding. It certainly has some holes that other functions plug more effectively.

But the real lesson is that knowing what you're encoding is the first step to picking a good encoding. We never even got close to the smallest PNG size with our QOI encoder. The kicker is that the smallest size I listed here isn't even the best PNG can do. The image as I downloaded it, before opening it in any other program was less than 64KB in size (which you can verify, because it's what is embedded above)! If you're familiar with classic graphics, you've been yelling at your monitor the whole time: with only 114 colors, you can number each of them with only 7 bits. The best way to encode this image is to write out a pallette, listing every color used, and then compress the one-byte-per-pixel (or less!) stream referencing the correct offset in the palette. You cut your input byte stream to 1/3 its size before you even start compressing!

And yet, it's still fun to try to do it other ways. :)

Categories: Development