QOI Update: Version 1.0 and Maps

Published Wednesday, January 5, 2022 by Bryan

While I was writing other things throughout December, the new 1.0 spec for the Quite Okay Image Format was released. I've just updated my Erlang implementation of QOI, so I've been comparing its performance on the test images from my last post about it.

% stat -f "%z %N" super-mario-world-map.qoi*
328628 super-mario-world-map.qoi
314086 super-mario-world-map.qoi.old
% stat -f "%z %N" phone-screenshot.qoi*
1617154 phone-screenshot.qoi
1889626 phone-screenshot.qoi.old

Nice! The phone screenshot (some text overlaid on a nature scene including clouds, sky and trees) is over 200KB smaller. But wait, the Super Mario World map (lots of bright primary colors on a solid black background) is over 15KB larger?! I know I said that this was the wrong file format to choose for such an image, but why did the changes that reduced the size of one image increase the size of another?

I started writing tools to help think about where compression and encoding/decoding efficiency might be improved. Let's use those to look at the differences in encoding between the versions.

{ok, <<_:14/binary, MarioQoiV0/binary>>} =
    file:read_file("super-mario-world-map.qoi.old").
eqoi_tools:collect_stats(MarioQoiV0).
% [{reference,122308,[]},
%  {run_long,3879,[]},
%  {run_short,90215,[]},
%  {substitute,23666,[]},
%  {mod_large,1052,[]},
%  {mod_medium,390,[]},
%  {mod_small,50,[]}]

{ok, <<_:14/binary, MarioQoiV1/binary>>} =
    file:read_file("super-mario-world-map.qoi").
eqoi_tools:collect_stats(MarioQoiV1).
% [{reference,119980,[]},
%  {run,103194,[]},
%  {substitute_no_alpha,25313,[]},
%  {mod_medium,2011,[]},
%  {mod_small,166,[]},
%  {substitute_alpha,0,[]}]

This tools produces a map of the number of uses of each chunk type. The types changed between V0 and V1, but the similarity of names I've used here should lead in the right direction for comparison. Maybe putting them in a table with their reference/spec names will make this easier.

Chunk Type V0.? Count V1.0 Count
reference / QOI_INDEX / QOI_OP_INDEX 122308 119980
run(_short) / QOI_RUN_8 / QOI_OP_RUN 90215 103194
run_long / QOI_RUN_16 3879 n/a
mod_small / QOI_DIFF_8 / QOI_OP_DIFF 50 166
mod_medium / QOI_DIFF_16 / QOI_OP_LUMA 390 2011
mod_large / QOI_DIFF_24 1052 n/a
substitute(_no_alpha) / QOI_COLOR / QOI_OP_RGB 23666 25313
substitute_alpha / QOI_OP_RGBA n/a 0

I've grouped the types across versions mostly by their logical purpose, but also by their byte size. So, the QOI_RUN_8 from version 0.? and QOI_OP_RUN from version 1.0 are in the same row because they are both 1-byte run encodings. I grouped QOI_COLOR with QOI_OP_RGB because even though the former is variable-width, it usually ends up the same width as the latter in this image.

The primary thing you should notice about this table is that the chunk count for every type except INDEX is greater in V1.0. The only place I think this might be a win is in QOI_OP_LUMA. The additional range offered by the green boost to the red and blue offsets may actually be pulling a few of the 3-byte QOI_DIFF_24 encodings down to 2-byte encodings.

The significant increase in RUN(_8) chunks is not unexpected in this image. It starts and ends with sixteen 1104-pixel-wide solid black lines. That's a run of 17,664 pixels of the default starting color. That's only 3 two-byte RUN_16 chunks, but 285 one-byte OP_RUN chunks. This is going to be a poor change for any image with same-color runs longer than 124 pixels (=2*62, the max length that OP_RUN can represent).

The reduction of INDEX chunks is where I am saddest. The new spec chose a new hash function. Hooray, that's what I argued for last time! But the new hash function actually performs worse on this image.

wx:new().
WI = wxImage:new().
wxImage:loadFile(WI, "super-mario-world-map-original.png").
ImageData = wxImage:getData(WI).
Channels = 3.
{ok, Encoder, Decoder} = eqoi:verify(Channels, ImageData).
Index = element(4, Encoder).
34 = maps:size(Index).

Last time, with the bitwise-XOR hash in v0.?, we used 35 of the index slots. This time, we're only using 34. This new hash does more shuffling, why didn't it improve index usage in this image? Well, here's the new hash function:

(R * 3 + G * 5 + B * 7 + A * 11) % 64

At first glance, things seem good - those odd numbers should "confuse" the power-of-2 division, right? Unfortunately, not really. In my last post, I noted that modulo-64 is the same thing as bitwise-AND-63, and that this throws out all information in the two highest bits of each channel. Visually:

V0.? hash, bxoring the low 6bits of the red (R), green (G), blue (B), and alpha (A) channels into a hash (H)

The bad news is that this new hash, despite its multiplications, does exactly the same thing. I'll explain why by sharing a technique I learned in the old days. When I was learning to program games in the 90s, multiplication was a slow enough operation that you wanted to avoid it if you could. Many computers didn't have a "math coprocessor", and even if they did, it was often still slower than other operations. This is why older code (and lots of embedded code still) uses a left shift instead of multiplying by 2: "N * 2" is equal to "N bsl 1" ("bsl" standing for "Bit Shift Left", which is often represented as "<<" in C-like languages). The technique I learned was that, for non-power-of-2 constants in your code, you could speed up multiplication by using the distributive property. Three is two plus one, and so "N * 3" is equal to "N * (2 + 1)" is equal to "(N * 2) + (N * 1)". Put this together with the left-shift, and you get "(N bsl 1) + N". Apply that to every multiplication in the new QOI hash, and visually, you get this:

V1.0 hash, adding the low 6bits of the red (R), green (G), blue (B), and alpha (A) channels into a hash (H)

That's right. The most significant two bits of each channel are still ignored. As before, this still doesn't matter for images containing many gradients, which translate to noise among the lower bits. But for high-contrast, bright-color images, this hash is still throwing out useful information. I applied my "just use SHA-1" patch to the V1.0 format, and collected the same statistics. Adding them to the table:

Chunk Type V0.? Count V1.0 Count V1.0-SHA Count
reference / QOI_INDEX / QOI_OP_INDEX 122308 119980 137377
run(_short) / QOI_RUN_8 / QOI_OP_RUN 90215 103194 103194
run_long / QOI_RUN_16 3879 n/a n/a
mod_small / QOI_DIFF_8 / QOI_OP_DIFF 50 166 53
mod_medium / QOI_DIFF_16 / QOI_OP_LUMA 390 2011 871
mod_large / QOI_DIFF_24 1052 n/a n/a
substitute(_no_alpha) / QOI_COLOR / QOI_OP_RGB 23666 25313 9161
substitute_alpha / QOI_OP_RGBA n/a 0 0

A 14.5% increase in INDEX chunks seems modest, until you see the effect on other chunk types. Less than half as many 2-byte OP_LUMA chunks, and almost two-thirds fewer 4-byte OP_RGB chunks! In fact, the SHA hash produces a file only 279,032 bytes long - just over 15% savings in total size.

But as with last time, I understand not wanting such a simple library to have to pull in a SHA dependency. What if the hash had used division instead of multiplication? Unfortunately, division does not have the same distributive property, and is also generally a more computationally expensive operation. But "N / 2" is equal to "N bsr 2" (Bit Shift Right), so what if we just used an imaginary operation, with the same distributive property as multiplication, but that translated to right bit shifts instead? In code:

pixel_hash(<<R/integer, G/integer, B/integer>>) ->
    ((R + (R bsr 1))                 % R not-really-div 3
     + (G + (G bsr 2))               % G not-really-div 5
     + (B + (B bsr 1) + (B bsr 2))   % B not-really-div 7
     + (255 + (255 bsr 1) + (255 bsr 3))) rem ?INDEX_SIZE.

Or visually:

Alternate 'bsr' hash, shifting right instead of left before adding red (R), green (G), blue (B), and alpha (A) channels into a hash (H)

One final column for the table:

Chunk Type V0.? Count V1.0 Count V1.0-SHA Count V1.0-BSR Count
reference / QOI_INDEX / QOI_OP_INDEX 122308 119980 137377 137338
run(_short) / QOI_RUN_8 / QOI_OP_RUN 90215 103194 103194 103194
run_long / QOI_RUN_16 3879 n/a n/a n/a
mod_small / QOI_DIFF_8 / QOI_OP_DIFF 50 166 53 3
mod_medium / QOI_DIFF_16 / QOI_OP_LUMA 390 2011 871 847
mod_large / QOI_DIFF_24 1052 n/a n/a n/a
substitute(_no_alpha) / QOI_COLOR / QOI_OP_RGB 23666 25313 9161 9274
substitute_alpha / QOI_OP_RGBA n/a 0 0 0

This hash produces a 279,347-byte file - a few hundred bytes larger than the SHA-1 hash. That seems perfectly inline with a few tens of INDEX and OP_LUMA chunks replaced by about 100 OP_RGB chunks. The improvement over the spec's hash is about the same.

So, it seems like QOI V1.0 isn't the best choice for some images. But that's okay, because we have other formats that serve those images better. What's QOI V1.0 good at? What about that screenshot that shrunk a few hundred KB? Here is the same analysis on its encodings:

Chunk Type V0.? Count V1.0 Count V1.0-SHA Count V1.0-BSR Count
reference / QOI_INDEX / QOI_OP_INDEX 82149 113947 107975 108323
run(_short) / QOI_RUN_8 / QOI_OP_RUN 89885 90269 90269 90269
run_long / QOI_RUN_16 260 n/a n/a n/a
mod_small / QOI_DIFF_8 / QOI_OP_DIFF 285437 256704 261904 261820
mod_medium / QOI_DIFF_16 / QOI_OP_LUMA 298488 493784 494549 494315
mod_large / QOI_DIFF_24 127236 n/a n/a n/a
substitute(_no_alpha) / QOI_COLOR / QOI_OP_RGB 113284 42163 42162 42132
substitute_alpha / QOI_OP_RGBA n/a 0 0 0

Now there is some improvement! Much better index use by the gradients in this picture. My alternative hash proposals actually produce up to 5% fewer INDEX chunks than the spec's hash. Large reductions in the numbers of 3-byte diff and 4-byte substitutions. This appears to be the sort of image QOI is targetting. So V1.0 is a good update for them.

While QOI isn't meant for every image (which, to be clear, the author repeatedly screams into the void), V1.0 is a nice update to the spec. The simplification of chunks - the removal of both DIFF_24 and the variable-length QOI_COLOR - allowed me to shave about 70 lines of code off of my implementation (about 10%).

After updating to V1.0, I also applied the knowledge I picked up during Advent of Code, about Erlang's map datatype. The color index is now a map, instead of a 64-element tuple. This appears to shave some time off of encode/decode for my screenshot image. It also allowed me to trim an additional 30 lines of code out of the implementation.

My implementation is on github: beerriot/eqoi. Feel free to send me a pull request if there are other improvements to make. I should go look at the Elixir implementation next, to see if there's anything that could be ported over.

Categories: Development