Project Box: Planning

While I think about how to tell you about the process of fitting the internal components of this box, I’m going to talk about planning.

box-build-planning - 1.jpg

The image above is the whiteboard in my shop, as it was at the end of this project. I’ve lost some of the context about what each scribble meant, but there are three obvious diagrams: the dovetails, the hinges, and the latches and handle. None are to scale. None indicate relationships to each other. All were drawn at the moment they were needed.

It’s tempting to write about how this plan-as-you-go process is because of the nature of wood. The many ways different grain patterns can and cannot be used, and the inability to be sure of what you’ll find inside a slab, means that most projects end up needing to be adapted to fit as they progress.

But this incremental design is how all of my projects go. The basic structure of a program gets sketched and then adapted as I start to code. Presentations are outlined and then rearranged as I find each part needing a different fit in the story. Dinner plans come together on the cutting board. Road trips have a destination and, “Something like this road will probably work.”

I would make far fewer things if I designed the entire solution up-front. There is, of course, plenty of planning that happens before the first cuts are made. However, there is a point in the initial design of every project at which there are too many unknowns. My solution is often to bring the work near the point where the project is blocked without their decision. This brings clarity to the details surrounding the issue. Sometimes the details become so clear that the solution is obvious, and other times I learn that the question wasn’t even relevant.

There are two keys to this flow working. The first is enough familiarity with the domain to recognize which decisions are likely to doom a project if not addressed early. My box must have internal dimensions large enough for the things I intend to store in it. I must have yeast and two hours of lead time if I want to bake bread for dinner. Put another way, it must be possible to determine what can be left unknown.

The second key to this process is the confidence that I can solve the problems that will arise. I find this one key to my work, even if I’ve over-planned. Years of projects in many domains have taught me that I have to expect that I will make a mistake somewhere in either my plan or my execution. I’ve also learned from this experience that very few of these mistakes spell disaster.

So, a whiteboard hangs in my shop to provide a place for information to accumulate to clarify the unknowns, as needed.

Project Box: Hinges

This box was designed to be a carrying case, so the lid need to be hinged to the bottom. In keeping with the theme of cheap practice, I grabbed brass-colored, stamped hinges at the local Home Depot. Also in keeping with the theme, I watched Matt Estlea’s videos on preparing these hinges and chiseling mortises for them. The particular style of hinge I bought required a little change to the plan, but I’ll explain that in a bit. The mortising started with lines scribed for the edges of the hinge plate.

scribe lines

The first step in choping out the hinge was not cutting along these lines. Instead, it was chopping across the grain, 1/8″ to 3/16″ inside these lines.

cross chopping

With the grain sliced, it was easy to pare in from the edge, without any risk of splitting past my markings.

paring 1

A couple of rounds of chopping and pairing, and I had reached my desired depth.

depth reached

With that base defined, I could work my way back to the scribed lines carefully.

working back to lines

The hinge plate was a nice snug fit side-to-side, but this is where the style of hinge came into play. The hinge pin stuck out of either end, so I needed to cut relief for it as well.

hinge shoulders

shoulder relief

fully seated

After repeating that process seven more times, drilling, and screwing, my lid was attached.

box-build-hinges - 1

It closed quite closely. There wasn’t even enough room for some thin cork lining I was considering.

lid closed

The hinges protruded so little that the box had no trouble standing on that edge. I may add some feet at some point, just to protect them a bit anyway.

Project Box: Right in Two

Possibly even more exciting than my dovetails turning out well was the fact that, from the first dry fitting, the box was square. Corner to corner, any difference in the diagonals was less than my tape measure would read.

box-build-squareness - 1

I cut a panel groove in each side, and then cut panels just barely undersized, to allow seasonal play. That required some careful planning and router setup, to prevent the groove from showing at the end of a pin or tail. Once everything was ready, I glued the whole box shut.

box-build-paneled - 1

 

I had always wanted to try this next step. If you’ve been paying close attention to the photos, you’ve probably noticed that one of the dovetail pins was wider than the others. In fact, it was exactly 1/8″ wider, which happens to be the kerf my table saw cuts. To turn this permanently closed box into an opening box, I sawed right through the middle of that wide pin.

box-build-sawn - 1

I ran the short ends through the saw first, then each long side. See the small edge near the corner that tapers along that long side? That’s from the thin top pulling away from the bottom as the saw relieved tension behind it. The sudden edge near the corner is there because the top couldn’t do that while I was sawing the short end. There’s a matching taper on the opposite corner, where the back side did the same. About three passes with a plane brought it right down.

box-build-two-pieces - 1

Project Box: Dovetails

It’s not perfect. Looking at it, preparing to write about it, I see flaws all over. But, I’ve finally cut a dovetail I don’t consider horrible.

box-build-good-dovetail-1.jpg

For years this joint has eluded me. Always too tight off the saw, but filled with giant gaps once together. Results were no better with a router – in fact, I ruined a bit and took a nice chunk out of the jig while trying.

So what changed? Some of the expected: I have more practice, in general, and I treated myself to a few new tools. But, there are two elements that I think played roles at least as important.

Improvement number one came from Matt Estlea’s videos. My skill in following written directions is second to none. I’ve read several step-by-steps, and attempted to perform their processes faithfully. Yet somehow watching Matt do it, listening to him talk through all the things he’s thinking about as he’s thinking about them, just made some of the important steps click as I was doing it myself this time.

As just one specific example, I’ve never particularly liked sharpening. I’ve read how to do it properly. I’ve bought nice stones to do it with. My edges have turned out well when I’ve done it. But it has always felt like a major chore. Watching Matt explain the reason behind the two bevels, and then show that it really is just a few strokes on the finer stones to keep the cutting edge in good shape made it a super easy thing to do, as often as I wanted. Quick, easy sharpening meant I (almost) never wasted my energy forcing a dull chisel to do awful work.

The second major element in my improvement came from experimenting, and closer evaluation of each result. This project is a box, and thus called for four dovetails. For the first, I used my new tools, thought about what Matt had demonstrated, and was close right off the saw. After that, I fixed the joint in my usual way, but ended up sort of “meh” – not my worst, but not enough better to make me excited about doing more.

I analyzed what had gone wrong. The joint didn’t fit to start; the pins just wouldn’t go in. So, I trimmed the pins to make them fit. By the time I got them pushed through, though, there were gaps around their visible ends. There are two simple explanations for this problem: either I pared a taper into the pins (so their tips were smaller than their bases), or there was already a taper on the tails (so the holes for the pins are larger on the outside than the inside).

Either problem indicates an error in sawing. I need to fix that, but in an attempt to get a feel for my new saw, I had decided to do all of my sawing up front. I needed to find a better way to fit whatever I already had. So, what about trimming the place that won’t show instead? Bingo: shaving the inside of the tail instead of the outside of the pin produced a much tighter dovetail.

My final problem was how to move more quickly. Dovetails one and two each took an hour and a half or more. If I had sawn true, things would have been significantly quicker, but test fitting went slowly. It was hard to know where to trim. This is when I pulled another YouTube tip out of my history. Larry Potterfield perfects ill-fitting pieces all the time, by coating one piece in some sort of carbon black, fitting it to the other, and then taking them apart again to see where they touched.

Obviously I didn’t want to coat the entire end of my board in carbon, but just a few scribbles of pencil on the cheeks of the pins would never been seen. Each test fit left smudges on the tails exactly where the pieces touched, darker where they squeezed harder. This cut down the guesswork, both speeding up the process and making sure I wasn’t weakening the joint by removing wood from the wrong places.

Joint number three was the first dovetail I’ve cut that I’m actually not completely unhappy with. Dovetail four was almost as good. A little overconfidence, or a little eagerness to get onto the next part of the project, may have played a role in some small mistakes. I’ll know soon: the dovetails on this project are practice for a much bigger project on the horizon.

Ethereum Signature Verification

But there’s the trouble. To get back the address of the originating account, you have to have both the signature and the original field values. If you supply different field values, you don’t get, “This signature doesn’t match,” you get, “This transaction came from a completely different account.”

Disclaimer: The views in this article are my own, and do not necessarily represent the views of my employer.

As part of the blockchain work I’ve been doing, I’ve been examining designs of the popular existing networks. Ethereum had my attention this week, and I was digging into its transaction authentication mechanisms when I found something confusing. I think it’s easiest to demonstrate with quick example.

Say I’m running a private network, and I submit a transaction to transfer value 1000 from one account to another. I can do that like this:

> eth.getBalance("0xaf4be85b32868c5b7c121115ad8cd93e0ad4f14e")
2025000000000000
> eth.getBalance("0xfab48eb52368c8b5c7211151dac89de3a04d1fe4")
0
> var tx = {"from": "0xaf4be85b32868c5b7c121115ad8cd93e0ad4f14e", "to": "0xfab48eb52368c8b5c7211151dac89de3a04d1fe4", "value": 1000};
undefined
> eth.sendTransaction(tx);
INFO [07-06|08:56:59] Submitted transaction fullhash=0x79c78bb2da3f61ddb2b55ffc89158b4ed0aa06917aa7f3d3813693e4da6deafb recipient=0xFAb48eb52368C8b5c7211151daC89DE3A04D1Fe4
"0x79c78bb2da3f61ddb2b55ffc89158b4ed0aa06917aa7f3d3813693e4da6deafb"

Once the transaction gets mined, I can see the value has moved:

> eth.getTransactionReceipt("0x79c78bb2da3f61ddb2b55ffc89158b4ed0aa06917aa7f3d3813693e4da6deafb").blockNumber
1
> eth.getBalance("0xaf4be85b32868c5b7c121115ad8cd93e0ad4f14e")
1646999999999000 // gas*gasPrice = 378000000000000
> eth.getBalance("0xfab48eb52368c8b5c7211151dac89de3a04d1fe4")
1000

And I can see the raw transaction:

> eth.getRawTransaction("0x79c78bb2da3f61ddb2b55ffc89158b4ed0aa06917aa7f3d3813693e4da6deafb")
"0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e880820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53"

A key component of blockchains is that every transaction is signed by its issuer, and meddling with the details of the transaction will be obvious to all parties. Let’s verify that. Let’s try to resubmit that transaction, and have it transfer value 1001, which requires modifying just one bit:

//              original: 0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e880820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53
// >---modified-bit-is-out-here-------------------------------------------------------modified bit---|
> eth.sendRawTransaction("0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e980820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53")
INFO [07-06|09:14:25] Submitted transaction fullhash=0xa06d51ff57bc41d33c812f08bdb65641db7581d97bd1d524fc8d0dd448a2aa9a recipient=0xFAb48eb52368C8b5c7211151daC89DE3A04D1Fe4
"0xa06d51ff57bc41d33c812f08bdb65641db7581d97bd1d524fc8d0dd448a2aa9a"

Why did we get a receipt? Surely we should have been told that transaction was invalid. Did it just get logged as a failure?

> eth.getTransactionReceipt("0xa06d51ff57bc41d33c812f08bdb65641db7581d97bd1d524fc8d0dd448a2aa9a").blockNumber
2

No. So did value move?

> eth.getBalance("0xaf4be85b32868c5b7c121115ad8cd93e0ad4f14e")
1646999999999000
> eth.getBalance("0xfab48eb52368c8b5c7211151dac89de3a04d1fe4")
2001

Kind of. Value was deposited in the target account, but wasn’t debited from the source account. Where did it come from?

> eth.getTransactionReceipt("0xa06d51ff57bc41d33c812f08bdb65641db7581d97bd1d524fc8d0dd448a2aa9a").from
"0xf2b40cc46f06f8d28a2b021851f721592b1f78e8"
> eth.getBalance("0xf2b40cc46f06f8d28a2b021851f721592b1f78e8")
1646999999998999 // started with the same balance as 0xaf4b...

That’s not the account debited in the original transaction. So, I guess it’s true that we weren’t able to replay that transaction with modifications. But what about this other account? It didn’t sign this transaction.

This is where I had to learn the details of how transaction signing works in Ethereum. To submit a signed transaction, your client must encode a string containing: nonce, gas price, gas limit, destination address, value, contract data, and chain ID. The exact encoding is irrelevant here, but those are the components of the transaction (see this post and EIP 155 for the full details). A representation (hash) of those values is passed to an elliptic curve signing function, along with your account’s private key. That function produces what is called a “recoverable signature”. This recoverable signature is added to the end of the previous list, and the resulting string is a “signed raw transaction”.

A signed raw transaction can be submitted to any Ethereum node, without that node needing to know the private key of the account submitting the transaction. Notice that the list of fields included doesn’t include the address of the account submitting the transaction. Anyone can recover that address using the signature and the original list of signed fields.

But there’s the trouble. To get back the address of the originating account, you have to have both the signature and the original field values. If you supply different field values, you don’t get, “This signature doesn’t match,” you get, “This transaction came from a completely different account.”

This is what happened in the example above. If we recover the address using the original values and signature, we get the address we used to sign the original transaction:

$ tools/ecrecover 0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e880820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53
Recovered: af4be85b32868c5b7c121115ad8cd93e0ad4f14e

But if we recover the address using the altered values and the signature, we get the other address:

$ tools/ecrecover 0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e980820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53
Recovered: f2b40cc46f06f8d28a2b021851f721592b1f78e8

Before you run off to tweet about this, let me say: trying to produce a set of field values to make some signature point to a particular account is not within the realm of your powers. I precalculated the account address that would match, and gave it value in my genesis block for the purposes of this demonstration. If I we try again with a value two greater, we get a completely different address that has no value:

$ tools/ecrecover 0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203ea80820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53
Recovered: c30d2d79cbb88531abeb585dcf4f4fcfbb4ce373
> eth.getBalance("0xc30d2d79cbb88531abeb585dcf4f4fcfbb4ce373")
0
> eth.sendRawTransaction("0xf86980850430e2340083015f9094fab48eb52368c8b5c7211151dac89de3a04d1fe48203ea80820348a0780d4ea898306e700cea578140cc4502d401c5949e7f4cab6a72c08a1a065aaba00cc22d63e2ef8acaaf1f6ead7a68de12e48a79f17d0a5ebab006fa996076ed53")
Error: insufficient funds for gas * price + value
at web3.js:3143:20
at web3.js:6347:15
at web3.js:5081:36
at :1:1

Trying to match a particular address is a process of mashing numbers hoping to accidentally hit one in 2160. Even if you just wanted to hit any in-use address, and each person alive on Earth had their own, you’d still be looking at one in 2127(=2160/233, 233 ≈ 8 billion).

So why do I care? Two reasons:

  1. Being corrected for the wrong mistake makes the protocol harder to use. The error above for the “two greater value” mismatch points a debugger toward balances, not toward signatures.
  2. Fixing this seems simple.

Number 2 is the naïve thing to say. If it’s so simple, why hasn’t it been done? It’s more likely that I just don’t understand the domain and/or design decisions made elsewhere. I’m going to trudge on with explaining anyway, and hope it leads to my education.

I think this can be fixed by including the originating address in the details that are signed. If what was signed was instead: nonce, gas price, gas limit, originating address, destination address, value, contract data, and chain ID; I think the problem would disappear entirely. We can try the same single-bit modification as last time:

# >---changes---------------------------------|-|---------added-from-address-----------|--------------------------------------------------------|---new-signature--->
$ tools/ecrecover 0xf87e80850430e2340083015f9094af4be85b32868c5b7c121115ad8cd93e0ad4f14e94fab48eb52368c8b5c7211151dac89de3a04d1fe48203e880820348a0a6ef55701d8b89007f729cbf0ff5abcabcfbf714017337c9ba3fd7d6fa9d22b1a05b2e8022f0a26f5928b7faa0aa94613328d9d6718a56cd4b951882e092467020
Recovered: af4be85b32868c5b7c121115ad8cd93e0ad4f14e
#          |-------------matches----------------^=|------------------------------------^
# >---modified-bit-is-out-here---------------------------------------------------------------------------------------------------------|
$ tools/ecrecover 0xf87e80850430e2340083015f9094af4be85b32868c5b7c121115ad8cd93e0ad4f14e94fab48eb52368c8b5c7211151dac89de3a04d1fe48203e980820348a0a6ef55701d8b89007f729cbf0ff5abcabcfbf714017337c9ba3fd7d6fa9d22b1a05b2e8022f0a26f5928b7faa0aa94613328d9d6718a56cd4b951882e092467020
Recovered: 8e80dee68fa86c07fe7753fad297a45ee0570eb0
#          |---------does not match-------------^=|------------------------------------^

But this time, we can compare the recovered address to the “from” address in the transaction. They don’t match, so we can say, “This signature doesn’t match.”

And hey, the “attack” gets harder too. It’s not as simple as just changing the from address in the transaction. If we do that, the recovered address also changes, to yet something different:

# >---changes-----------------------------------|----recovered-address-from-above------|---------------------value-is-still-modified---|
$ tools/ecrecover 0xf87e80850430e2340083015f90948e80dee68fa86c07fe7753fad297a45ee0570eb094fab48eb52368c8b5c7211151dac89de3a04d1fe48203e980820348a0a6ef55701d8b89007f729cbf0ff5abcabcfbf714017337c9ba3fd7d6fa9d22b1a05b2e8022f0a26f5928b7faa0aa94613328d9d6718a56cd4b951882e092467020
Recovered: 680d265d9d2c9654e02bd6d46b2db025058a6672
#          |---still-does-not-match-------------^=|------------------------------------^

With this scheme, to find a valid transaction, you’re forced to find a match for a specific address. So, as a side effect of improving usability, we also return to a collision probability of one in 2160.

Is it the case that EIPs 712 and 191 are attempting to address some of this situation, but not directly? It seems like “malleability” of ECDSA signatures, while perhaps slightly different than what is described above, is something that has caused trouble elsewhere.

Finally, thanks to the makers of two tools that helped me debug what was going on: Ethereumjs-tx, which includes a nice “from” recovery function, and Keythereum, which can extract a private key from a geth keystore.

Am I on track, or have I missed something? Let me know.

Blockchain 2018 == NoSQL 2009

“What is NoSQL?” was an impossible question to answer. Beyond, “A database that doesn’t use SQL,” it meant something different to each person involved. “What is Blockchain?” is the same way.

Disclaimer: The views in this article are my own, and do not necessarily represent the views of my employer.

Blockchain is to 2018 what NoSQL was to 2009.

  • Large variety of implementations, each addressing a different part of the problem.
  • Vastly different interfaces, despite overlapping terminology.
  • Conferences where all of the major players are in attendance.
  • Discussion of decades-old research finally applicable to modern software.
  • A few key success stories.
  • A few gross misapplications.

And, of course, haters from all corners, ready to tell you that you don’t need a blockchain, just like there were so many ready to tell you that you didn’t need NoSQL.

The haters are right, as they always are. You could have used literally any database to do the sorts of storage, and storage guarantees, that NoSQL gave you. You can sign data and write updates immutably to get the sorts of auditability and trust that a blockchain gives you.

This is why I find it interesting that many NoSQL leaders are such vocal Blockchain haters. They know what it’s like to say, “No, not everyone needs NoSQL, but it can provide value to some,” or, “No, they’re not new pieces, but their combination is greater than the sum of their parts.” Yet they’re fully willing to claim that Blockchain is less that worthless.

Maybe part of the problem is terminology, again. “What is NoSQL?” was an impossible question to answer. Beyond, “A database that doesn’t use SQL,” it meant something different to each person involved. “What is Blockchain?” is the same way. Is it the specific structure of “blocks of data, chained by cryptographic hashes”? Is it decentralized trust? If is “proof of $something” consensus? Is it cryptocurrency?

Both of these communities are personal to me. I was part of the NoSQL movement while working on Riak at Basho, and on CloudKit at Apple. I’m writing about Blockchain today, because for the last few months, I’ve been part of the Blockchain movement, working on a project at VMware. The NoSQL argument is over, so I won’t bother rehashing it. But, I would like to discuss here why I’ve found the Blockchain domain worth my time.

A large portion of my interest in Blockchain comes down to the specific project I’ve chosen to work on. My team at VMware understands that, just as NoSQL was a term used to describe tools for enabling scalable databases, “Blockchain” is a term used to describe tools for enabling distributed trust. The core tool often discussed, though it is really the combination of several technologies, is byzantine fault tolerant state machine replication.[1]

Yes, BFTSMR is “just” replication with consensus[2] and cryptography[3], and you could assemble a number of existing technologies to make your own. But, just like Dynamo was “just” a distributed hash table with logical clocks and gossip, the exact combination provides some rather different capabilities. Each of the components has seen major developments since the classic texts explored them:

  • Consensus algorithms: with distributed systems becoming ever more common, we’ve had a renaissance of better understanding and implementation.
  • Cryptographic signatures: hardware has many useful primitives built in to make this efficient, and things like threshold signatures have enabled new optimizations.
  • State machine abstraction: it is easier than ever to embed an efficient interpreter in your system, to allow quick extensibility and experimentation.
  • Networking: RDMA, public cloud, edge networking, and other new topologies beg for new communication patterns to harness their potential.
  • Storage: flash is cheap, and any data model you want is readily available.

BFTSMR has been ignored for years (decades!) because it has been impractical for real systems. With the latest developments, even though it hasn’t reached the speed and efficiency of optimized central databases, it is starting to become a reasonable choice for some applications. If the abstractions commonly provided by Blockchain (like transactions recorded in an immutable log) allow people to take advantage of these classic good ideas and advancements, I am excited to help enable that.

There are some very real things to worry about in association with Blockchain. Cryptocurrencies are not investment strategies, and are not going to have any of the promised power-distribution and -equalization effects they peddle while their value and cost fluctuate so wildly. Control of existing networks is misunderstood. Proof of work is criminally wasteful. Current anonymization is so difficult that most attempts are trivially unmasked. But just as we didn’t let misunderstandings about consistency, early bugs with data loss, or questionable performance numbers bury the very real improvements brought by NoSQL, let’s not write off all of Blockchain just yet.

I’m very excited about the real business improvements that will be enabled by the technical innovations my team is making at VMware. If you would like to join us, please reach out either to one of us directly or through our careers portal.

[1] It’s scary how close you can come to an acronym: Byzantine fauLt tOlerant state maCHine replicAtIoN. Maybe that’s why Ethereum is a staCK machine.

[2] Replication should imply consensus, but we saw how that went in NoSQL.

[3] Apologies for using “cryptography” to mean all things that may have cryptographic properties: hashes, signatures, obfuscation, …

ASAP Smoothing

Last week, Peter Bailis announced a new tool for smoothing timeseries data for plotting, called ASAP:

Since I just happened to have some fresh data handy, I decided to try it out. You may recall this graph of the difference in pressure measured between a sensor submerged in fermenting beer, and a sensor in open air:

screen-shot-2017-02-27-at-9-28-05-pm
Helium-smoothed pressure data (190 points)

That graph was based on data that was pre-smoothed using a windowed average provided by the Helium API. The window is one hour, which produced 190 points. Zooming in just a little bit takes us to a 30 minute window, at 303 points:

Screen Shot 2017-03-11 at 12.20.49 PM
Window-averaged pressure data (303 points)

Unlike the other graphs I plotted in that series, I didn’t plot the maximum and minimum on this one. Here is what the raw data looks like:

Screen Shot 2017-03-11 at 4.49.51 PM
Raw pressure data (11282 points)

That is 11282 points. There are spikes several times taller than what looks like the “typical” variation. The question, then, is did the windowed average portray this data accurately? Here are 303 points overlayed on the 11282 (I cut off the peaks to give us a little more detail):

Screen Shot 2017-03-11 at 12.31.02 PM
Raw pressure data (grey) with Window-averaged data (blue)

I think it seems reasonable. A little noisy, but visually following the mid-point of the band. Can ASAP do better? Here are the 304 points it gives when asked for 303 from this dataset:

Screen Shot 2017-03-11 at 12.37.14 PM
Raw data (grey), window-averaged (blue), ASAP (red)

If you look closely, you can see a few blue edges from the windowed average peaking out from under the ASAP plot, but they’re largely the same. ASAP hasn’t surfaced any additional features in this data that the plain windowed average hid. I think that’s not surprising. The process that was being measured was a slow, continuous change, and not something where there should have been sudden changes in behavior.

So instead, let’s look at some data with an anomaly, like the absolute-force graph from the tilt sensor:

screen-shot-2017-02-25-at-7-08-08-pm
Helium-smoothed force data (dark blue) with min-max area (light blue)

That was using window-averaged data as well. Let’s plot it against raw data and ASAP like before:

Screen Shot 2017-03-11 at 4.28.48 PM
Raw force data (grey), window-averaged (blue), ASAP (red)

That’s strange. The blue window-averaged line follows the raw data pretty well, at the 303 points I asked for. The ASAP line looks weirdly off, though. The smooth function only returned 279 points, and it’s caused by a gap in the middle. That straight line around the anomaly contains 30 points. The fact that the curve to the left of that line looks like it’s shifted in time makes me suspicious, but there seem to be no errors generated. Even if I bump up the resolution to the full 890 pixels of this SVG, the ASAP curve looks like this, and produces 31 fewer points than the windowed-average.

The data and code I’m using to plot it are available in this gist: https://gist.github.com/beerriot/5e343e35e4930947fce77f36f1f5fbe5

Off to ping Peter…