ASAP Smoothing

Published Saturday, March 11, 2017 by Bryan

Last week, Peter Bailis announced a new tool for smoothing timeseries data for plotting, called ASAP:

Since I just happened to have some fresh data handy, I decided to try it out. You may recall this graph of the difference in pressure measured between a sensor submerged in fermenting beer, and a sensor in open air:

Helium-smoothed pressure data (190 points)

That graph was based on data that was pre-smoothed using a windowed average provided by the Helium API. The window is one hour, which produced 190 points. Zooming in just a little bit takes us to a 30 minute window, at 303 points:

Window-averaged pressure data (303 points)

Unlike the other graphs I plotted in that series, I didn't plot the maximum and minimum on this one. Here is what the raw data looks like:

Raw pressure data (11282 points)

That is 11282 points. There are spikes several times taller than what looks like the "typical" variation. The question, then, is did the windowed average portray this data accurately? Here are 303 points overlayed on the 11282 (I cut off the peaks to give us a little more detail):

Raw pressure data (grey) with Window-averaged data (blue)

I think it seems reasonable. A little noisy, but visually following the mid-point of the band. Can ASAP do better? Here are the 304 points it gives when asked for 303 from this dataset:

Raw data (grey), window-averaged (blue), ASAP (red)

If you look closely, you can see a few blue edges from the windowed average peaking out from under the ASAP plot, but they're largely the same. ASAP hasn't surfaced any additional features in this data that the plain windowed average hid. I think that's not surprising. The process that was being measured was a slow, continuous change, and not something where there should have been sudden changes in behavior.

So instead, let's look at some data with an anomaly, like the absolute-force graph from the tilt sensor:

Helium-smoothed force data (dark blue) with min-max area (light blue)

That was using window-averaged data as well. Let's plot it against raw data and ASAP like before:

Raw force data (grey), window-averaged (blue), ASAP (red)

That's strange. The blue window-averaged line follows the raw data pretty well, at the 303 points I asked for. The ASAP line looks weirdly off, though. The smooth function only returned 279 points, and it's caused by a gap in the middle. That straight line around the anomaly contains 30 points. The fact that the curve to the left of that line looks like it's shifted in time makes me suspicious, but there seem to be no errors generated. Even if I bump up the resolution to the full 890 pixels of this SVG, the ASAP curve looks like this, and produces 31 fewer points than the windowed-average.

The data and code I'm using to plot it are available in this gist: https://gist.github.com/beerriot/5e343e35e4930947fce77f36f1f5fbe5

Off to ping Peter…

Categories: Development