Archive for the ‘Development’ Category

ASAP Smoothing

Last week, Peter Bailis announced a new tool for smoothing timeseries data for plotting, called ASAP:

Since I just happened to have some fresh data handy, I decided to try it out. You may recall this graph of the difference in pressure measured between a sensor submerged in fermenting beer, and a sensor in open air:

screen-shot-2017-02-27-at-9-28-05-pm

Helium-smoothed pressure data (190 points)

That graph was based on data that was pre-smoothed using a windowed average provided by the Helium API. The window is one hour, which produced 190 points. Zooming in just a little bit takes us to a 30 minute window, at 303 points:

Screen Shot 2017-03-11 at 12.20.49 PM

Window-averaged pressure data (303 points)

Unlike the other graphs I plotted in that series, I didn’t plot the maximum and minimum on this one. Here is what the raw data looks like:

Screen Shot 2017-03-11 at 4.49.51 PM

Raw pressure data (11282 points)

That is 11282 points. There are spikes several times taller than what looks like the “typical” variation. The question, then, is did the windowed average portray this data accurately? Here are 303 points overlayed on the 11282 (I cut off the peaks to give us a little more detail):

Screen Shot 2017-03-11 at 12.31.02 PM

Raw pressure data (grey) with Window-averaged data (blue)

I think it seems reasonable. A little noisy, but visually following the mid-point of the band. Can ASAP do better? Here are the 304 points it gives when asked for 303 from this dataset:

Screen Shot 2017-03-11 at 12.37.14 PM

Raw data (grey), window-averaged (blue), ASAP (red)

If you look closely, you can see a few blue edges from the windowed average peaking out from under the ASAP plot, but they’re largely the same. ASAP hasn’t surfaced any additional features in this data that the plain windowed average hid. I think that’s not surprising. The process that was being measured was a slow, continuous change, and not something where there should have been sudden changes in behavior.

So instead, let’s look at some data with an anomaly, like the absolute-force graph from the tilt sensor:

screen-shot-2017-02-25-at-7-08-08-pm

Helium-smoothed force data (dark blue) with min-max area (light blue)

That was using window-averaged data as well. Let’s plot it against raw data and ASAP like before:

Screen Shot 2017-03-11 at 4.28.48 PM

Raw force data (grey), window-averaged (blue), ASAP (red)

That’s strange. The blue window-averaged line follows the raw data pretty well, at the 303 points I asked for. The ASAP line looks weirdly off, though. The smooth function only returned 279 points, and it’s caused by a gap in the middle. That straight line around the anomaly contains 30 points. The fact that the curve to the left of that line looks like it’s shifted in time makes me suspicious, but there seem to be no errors generated. Even if I bump up the resolution to the full 890 pixels of this SVG, the ASAP curve looks like this, and produces 31 fewer points than the windowed-average.

The data and code I’m using to plot it are available in this gist: https://gist.github.com/beerriot/5e343e35e4930947fce77f36f1f5fbe5

Off to ping Peter…

Beer IoT (Part 8)

Strike up the band, it’s time for the eighth, and final, installment of this fermentation instrumentation series. In part four, I placed several different sensors in several different carboys of beer beginning fermentation. In parts six and seven, I analyzed a week of data from two of the sensors. This post will cover the third sensor, a floating accelerometer.

screen-shot-2017-02-20-at-1-30-20-pm

The ADXL345 provides three readings for each sample: one for each axis in 3D space. I’m using the chip in “2g 10-bit” mode, which means each axis will report a number from -512 (2g negative acceleration) to +512 (2g positive acceleration). In this case, the only acceleration I want to measure is gravity, so I should see only values between -256 (1g negative acceleration; “this axis is pointing straight down”) and 256 (1g positive acceleration; “this axis is pointing straight up”). Using a bit of trigonometry, I should be able to figure out the angle at which the sensor is tilted.

My float is sort of a rounded rectangular prism. I’ve oriented the sensor such that the y-axis is in line with the long axis of the prism, with the positive end pointing toward the end I expect to float. The x-axis is horizontal across the short axis of the prism, and the z-axis is pointed “up”. The expectation is: y will start about zero, or slightly positive, as high buoyancy keeps the float “flat”; x will start about zero as well, because any dip should be along y; and z will start near max, almost straight up. As the beer ferments, reduced buoyancy should cause one end to dip, causing y to increase (because it will point upward more steeply), and z to decrease (because it will move off of straight upward), with x staying the same (because the rotation should be around that axis). So what actually happened?

screen-shot-2017-02-25-at-7-01-16-pm

Pictures may be worth a thousand words, but I’m pretty sure this one just says, “Not that.” We have both x and z increasing, and y is doing … I’m not even sure. Let’s see if there is anything to salvage.

Let’s check an assumption first. I’m expecting to only see acceleration from gravity here, so the total acceleration should always be 1g (plus or minus some measurement noise). We can check that with a bit of Pythagorus: the square root of the sum of the squares of the readings should be a constant 256.

screen-shot-2017-02-25-at-7-08-08-pm

Except for the sudden change in the middle, it’s a variance of about 1, which is 0.0039g for this chip. It’s interesting that it’s only 252 max, and I have no idea what that sudden shift is (it seems correlated with a sudden shift on the z axis, but nothing on the other axes), but it does look like we’re measuring approximately a constant 0.97-0.98g force.

The increasing x may a bit of a red-herring. It just means that the tube is “rolling” (turning around its long axis). This is why z is increasing as well: x returning to horizontal around the unchanging y axis means that z is returning to vertical. There is a chance that the float is rolling instead of tipping as buoyancy changes. This might be worth returning to later, but let’s see if we can save y first.

Despite the fact that our sanity check showed that we’re reading constant gravity as we expect, and therefor all axes agree, we could use Pythagorus again to compute what the value of the y reading should have been, given x, z, and our expected force:

screen-shot-2017-02-25-at-7-08-45-pm

The synthesized y reading is in blue, while the actual y reading is in red. This graph used 256 for the expected gravity. Let’s instead use the 253/250 mix we saw before, which will also account for that unexplained shift in z:

screen-shot-2017-02-25-at-7-09-50-pm

Many features are similar between these plots, but we appear to have exaggerated a somewhat steady descent in y during the period that x and z where steadily climbing (Feb 19 through 21). I expected y to start around zero and become more negative over time. Starting above zero, and decreasing anyway just means that the sensor was tilted away from the expected sinking angle to start. Interestingly, y moving from slightly up to closer to horizontal will also have the effect of bringing z up closer to vertical, just like x moving from negative toward zero did.

If the rolling is not the result of buoyancy change, then this change in y alone leaves us with a change from early Feb 19 mid afternoon Feb 21 of either 33-22 (computed, blue line) or 10-5 (observed, red line). asin(33/256) = 7.41º, asin(22/256) = 4.93º; asin(10/256) = 2.24º, asign(5/256) = 1.12º. Using 252 instead of 256 only alters the result by 0.1º. So, a change of at most 2.5º, and at least 1º. A bit of a narrow bad, if you ask me.

If the sensor shifted during placement, the x axis might be measuring pitch instead of roll. But if even if not, what if the fermentation primarily produced rolling instead if pitching? The x reading swings from -115 to -100. That’s 26.7º to 23º, meaning a change of 4.3º. That’s more, is about all that can be said about that.

If we take both roll and pitch together, we can just consider the change in z, but we also have to ignore the sudden shift near the end of the time range we were looking at. That gives 223 to 231, or 60.59º to 64.46º. Still just 4º change.

screen-shot-2017-03-04-at-6-37-33-pm

If I return to the design of the float, its weight of the float is 1.8oz. So, to float in fresh water, it will have to displace 1.8oz, or 3.24 cubic inches. The float is 4 inches long, by 1 3/16 inches wide, by 7/8 inches tall. A rough estimate places that at 4.15625 cubic inches. The rounded edges are tricky, though. Measuring via displacement shows it’s actually about 3.5 cubic inches. So, what I have is a float that is only just barely floating in water. That was the aim, and what was observed, but good to see the math line up.

If the float has to displace 3.24 cu.in. of water at specific gravity 1.000, then at our starting gravity of 1.040 it only has to displace 3.12 cu.in. of unfermented beer, and 3.22 cu.in. of beer at the finishing gravity of 1.0075. So we’re looking at a change of 0.1 cubic inches of displacement.

I found the center of mass to be about 3/8 inch closer to the end that is expected to sink. That’s not a huge margin of influence, but since the float will be almost entirely submerged anyway, it’s probably enough. The heavy end also happens to have less volume, due to the curvature, so it should have to sink more to displace the same amount.

Calculus is probably the correct way to solve this problem. 18.01 was a long time ago, though, so I’m hoping I can fudge it. Instead of trying to figure out how far this float should have tipped, let’s figure out if a 4º pitch could have changed the displacement 0.1 cu.in.

If we’re already mostly-submerged, we’re looking near an edge that is not 1 3/16 inches wide, but instead closer to 0.75 in (due to curvature). If the float were flat (which the y and z axis readings mostly suggest), 0.75 * 3.75 in would be above water 0.036 in. If we pitched that 4º, we would lose 0.12 cubic inches into the water:

Height lost at 4º over 3.75″: (sin(4/180)*3.75) = 0.08332647479 inches

Volume of 0.083 x 3.75 x 0.75 ” triangular prism: 0.75 * (0.083 * 3.75)/2 = 0.117 cubic inches

So 4º could actually the correct change for these parameters! This does require that the x and z axes were reading pitch, and not roll, though. A 4º roll with these parameters is only a change of about 0.02 cubic inches.

A final check: how well does the shape of this data match the shape of the BeerBug’s?

screen-shot-2017-03-04-at-6-03-56-pm

Comparing the Z axis, it looks like the story is similar to the pressure sensor: the change plateaus at about the same point as the BeerBug, signaling the end of primary fermentation. If the 4º change was measuring the correct thing, then math would have told me the correct final gravity, but it would have been much easier to have developed a calibration table with known angles of specific gravities beforehand.

That wraps up this experiment. I’ve added this data to the gist containing the other sensor data. What’s next for this beer sensing story … ?

For the BeerBug, it may be worth continuing to use their service. The device works when the service works, as shown in this data. If anything, I think I’d work on snooping its communication, so I could tee it off to my own storage, in case their service goes down again.

For the pressure sensor, it’s mostly about a new housing, and then calibration to that housing. It needs something that is both heavy enough to sink, and also flexible enough to compress. If both of those are taken care of, it seems like calibrating to known specific gravities may actually provide decent data.

For the tilt sensor, it’s also about a new housing. The weight distribution needs to be far more unbalanced, to ensure a larger change in angle. Something narrower, so that more sinking is required to balance displacement, would work to. If those can be taken care of, then calibration may make this as good as other options.

Both the pressure sensor and the tilt sensor would also benefit from getting the Helium Atom and battery onboard. Current results are probably affected by the cable running out of the carboy. For now, this would require fermenting in something with a wider neck, since the development board is too wide to fit in the carboy. That’s easy to do, and would avoid me having to design my own printed circuitry.

What was really amazing to me is how easy this sort of thing has become. A week after I got hardware, I put it into service. I2C is a nice standard communication protocol. Lua is a quick language to pick up. The Helium chip, library, and service work very smoothly. Between the dev kit and the sensors, I’m over $100, but less than $200 into this exploration. I can see why people are excited about IoT these days – it’s easy to get started, and fun to participate.

But for now, there are two cases of beer to sample in a couple of weeks, and they’re stacked under earlier brews, so I won’t have any more fermentation to measure for a while. I’m setting up one of my Atoms to monitor the temperature in the conditioning closet. I wonder what I should start measuring with the other.

Beer IoT (Part 6)

Welcome back for part six of the fermentation instrumentation series. In part four, I placed a few different sensors in some actively fermenting beer to gather data. A week has now passed, and I’ve bottled the beer. Time to look at the data. Let’s start with the device we know – the BeerBug.

screen-shot-2017-02-20-at-1-22-45-pm

We’re lucky this time. New owners have just taken over BeerBug operations, and they’re relaunching the product. Unfortunately, that means they’re going through a bumpy transition period. While I was brewing, I could see the latest reading from my beer, but none of the history. But, after a lengthy email exchange, they have pulled through, and I have the data for this batch.

As before, I’ve uploaded the data to Helium’s servers. This is mostly so I can use the same tools for processing the data for all three of the batches in this experiment. So, with out further ado, this is how the BeerBug thought the specific gravity of my English Mild changed over the week:

screen-shot-2017-03-02-at-9-28-47-pm

This is pretty typical. All the way at the left, we have the gravity that I specified as my starting point, what I read from my glass hydrometer: 1.039. The phenomenon that has been observed for every beer, but is as yet unexplained, comes next: the climb to a higher gravity. This probably has something to do with the initial yeast activity, as they rapidly reproduce throughout the beer, consuming the dissolved oxygen, and beginning to produce carbon dioxide. The gas exchange or cell proliferation may change the buoyancy observed by the BeerBug’s float.

After the initial climb late Saturday, we dive right into the expected steady decline in gravity over the next few days. By night time on Tuesday, the gravity has nearly leveled off. A much slower decline continues as the few yeast cells that haven’t starved continue to find some sugar to eat. By the time I bottled on Sunday, the BeerBug read 1.006. My regular glass hydrometer agreed ±0.001. That’s pretty impressive.

The other thing that seems impressive is that there is far less noise in this data than there was in the BeerBug data from part three of this series. I think the explanation for this begins with the fact that there are fewer points in this dataset. In part three, there was a reading every minute. In this dataset, there is sometimes a reading every minute, but sometimes a reading only every 3, 5, or 10 minutes. This might represent a new strategy in the BeerBug firmware – if the measurement variation was white noise, averaging over longer periods should reduce it. Or, it could be just missing data, which would make the error band (the light blue) close in on the average (the dark blue), because the average *is* the data if you remove enough.

The BeerBug also has a temperature sensor in the housing that sits above (outside) of the carboy. Here is its data, in blue, with the temperature data we looked at from one of the Helium boards in part five of this series, in red:

screen-shot-2017-03-02-at-11-03-53-pm

 

The readings begin only a degree and a half or so off, but the drop into Sunday morning is deeper for the BeerBug. Its readings also stay consistently nearly 3ºC cooler. This was unexpected, given the placements of these sensors. The Helium sensor was closer to an external door, and the BeerBug was just a few inches above active yeast. I’ll chalk it up to simple differences in the characteristics of the sensors, for now.

I’ve uploaded this data to a gist in CSV format, if you would like to examine it yourself. In the next post, we’ll look at the data from the pressure sensor, and see if we can find a shape similar to the BeerBug’s.

Update: part seven is live with pressure data.

FSMs Make Instrumentation Easy

This piece originally appeared on the Honeycomb.io blog as part of a series on instrumentation.

There is a way to structure programs that makes inclusion of instrumentation straightforward and automatic, and it’s one that every hardware and software engineer should be completely familiar with: finite state machines. You have seen them time and again as illustration of how a system works:

What makes FSM instrumentation straightforward is that the place to expose information is obvious: along the edges, when the state of the system is changing. What makes it automatic is that some generic actor is usually driving a host of specific FSMs. You only need to instrument the actor (“entering state Q with message P”, “leaving state S with result R”), and every FSM it runs will be instrumented for free.

I learned how easy FSMs are to instrument while working on Webmachine, the webserver that is known for implementing the “HTTP Flowchart”.

Each Webmachine resource (a module handling a request) is composed of a set of decision functions. The functions are named for the points in the flowchart where decisions have to be made about which branch to follow. This is just alternate terminology, though: the flowchart and resource describe an FSM, in which the decision points (and terminals) are states.

Driving the execution of a Webmachine resource is a module called webmachine_decision_core. This is where the logic lives for which function to call, and which branch to take based on the result. It triggers each function evaluation by calling a generic webmachine_resource:resource_call function, with the name of the decision.

resource_call(F, ReqData,
              #wm_resource{
                 module=R_Mod,
                 modstate=R_ModState,
                 trace=R_Trace
                }) ->
    case R_Trace of
        false -> nop;
        _ -> log_call(R_Trace, attempt, R_Mod, F, [ReqData, R_ModState])
    end,
    Result = try
        apply(R_Mod, F, [ReqData, R_ModState])
    catch C:R ->
            Reason = {C, R, trim_trace(erlang:get_stacktrace())},
            {{error, Reason}, ReqData, R_ModState}
    end,
    case R_Trace of
        false -> nop;
        _ -> log_call(R_Trace, result, R_Mod, F, Result)
    end,
    Result.

This is where the ease of instrumenting an FSM is obvious. The entirety of the hooks needed to support tracing and visual debugging of every Webmachine resource are those two log_call lines. They record the entrance and exit of each state of the FSM without requiring any code to complicate the implementation of the resource module itself. For example, a simple resource:

-module(blogapp_resource).
-export([
    init/1,
    content_types_provided/2,
    to_html/2
]).

-include_lib("webmachine/include/webmachine.hrl").

init([]) ->
    {{trace, "/tmp"}, undefined}.

content_types_provided(ReqData, State) ->
    {[{"text/html", to_html}], ReqData, State}.

to_html(ReqData, State) ->
    {"<html><body>Hello, new world</body></html>", ReqData, State}.

This resource does no logging of its own (as you can see), but for each request it receives, a file is created in /tmp that can be rendered with the Webmachine visual debugger. For example, the processing for a request that specifies Accept: text/html looks like this (live example):

heavy-happy-path

It’s easy to see that the request made it all the way to the 200 OK result at grid location N18. Along the way, it passed through many decisions where the default behavior was chosen (grey-outlined diamonds), and a few where the resource’s own implementation was called (purple-outlined diamonds). Clicking on any decision will display more information about what happened there.

In contrast, the processing for a request that specifies Accept: application/json looks like this (live example):

heavy-error-path

Now it’s easy to see that the request stopped at the 406 Not Acceptable result at grid location C7 instead. For no more code than specifying where to put the log output, we’ve gotten the complete story of how each request was handled. In case you prefer the original text to this visual styling, I’ve also archived the raw trace files.

This sort of regular, simple instrumentation may seem naive, but the regularity and simplicity offer some benefits. For example, all of the instrumentation points have obvious names: they are the same as the states of the FSM. This alone continues to help beginners bootstrap their understanding of Webmachine. When they’re confused about why something happened, they can go straight to the trace or debugger, and either search for the name of the decision they expected to turn differently, or find the name of the decision that did go differently, and know exactly where to return to in their code. Resource implementors add no code, but get well-labeled tracing for free.

Finite state machines can be found under many other names: flowcharts, chains, pipelines, decision trees, and more. Any staged-processing workflow benefits from a basic “stage X began work W”, “stage X finished work W”, which is completely independent of what the stage is doing, and is equivalent to the stage entering and exiting the “working” state. See Hadoop’s job statistics for an example: generically generated start/stop information that an operator can use to get a basic idea of progress without needing the job implementor to add their own instrumentation. I sometimes even consider the basic request/response logging of multi-service systems as a form of this: sending a request is equivalent to entering a waiting state, etc.

To speak more broadly, the important points to instrument are those when application state is changing. This is how I track down where a process diverged from its expected path, or how long it took to make the change. Finite state machines help by making those points more obvious. Instrumenting state transitions reduces the burden on the implementor, by naturally answering the question of where instrumentation belongs and what it’s called. It also reduces the burden on the user of learning what the implementor decided. Inspection of the system becomes easier because the state transitions are always instrumented, and instrumented in a way that maps directly to the system’s operation.

Thanks to Julia and Charity for organizing the instrumentation series.

Roundtripping the HTTP Flowchart

Webmachine hackers are familiar with a certain flowchart representing the decisions made during the processing of an HTTP request. Webmachine was designed as a practical executable form of that flowchart.

It has long bugged many of the Webmachine hackers that this relationship is one-way, though. Webmachine was made from the graph, but the graph wasn’t made from Webmachine. I decided to change that in my evenings last week, while trying to take my mind off of Riak 1.0 testing.

This is a version of the HTTP flowchart that only a Webmachine hacker could love. It’s ugly and missing some information, but the important part is that it’s generated by parsing webmachine_decision_core.erl.

I’ve shared the code for generating this image in the gen-graph branch of my webmachine fork. Make sure you have Graphviz installed, then checkout that branch and run make graph && open docs/wdc_graph.png.

In addition to the PNG, you’ll also find a docs/wdc_graph.dot if you prefer to render to some other format.

If you’d really like to dig in, I suggest firing up an Erlang node and looking at the output of wdc_graph:parse("src/webmachine_decision_core.erl"):

[{v3b13, [ping],                     [v3b13b,503]},
 {v3b13b,[service_available],        [v3b12,503]},
 {v3b12, [known_methods],            [v3b11,501]},
 {v3b11, [uri_too_long],             [414,v3b10]},
 {v3b10, [allowed_methods,'RESPOND'],[v3b9,405]},
 {v3b9,  [malformed_request],        [400,v3b8]},
...

If you’ve looked through webmachine_decision_core at all, I think you’ll recognize what’s presented above: a list of tuples, each one representing the decision named by the first element, with the calls made to a resource module as the second element, and the possible outcomes as the third element. Call wdc_graph:dot/2 to convert those tuples to a DOT file.

There are a few holes in the generation. Some response codes are reached by decisions spread across the graph, causing long arrows to cross confusingly. The edges between decisions aren’t labeled with the criteria for following them. Some resource calls are left out (like those made from webmachine_decision_core:respond/1 and the response body producers and encoders). It’s good to have a nice list for future tinkering.

NerdKit Gaming: Part 2

If you were interested in my last bit of alternative code-geekery, you may also be interested to hear that I’ve pushed that NerdKit Gaming code farther. If you browse the github repository now, you’ll find that the game also includes a highscore board, saved in EEPROM so it persists across reboot. It also features a power-saving mode that kicks in if you don’t touch any buttons for about a minute. Key-repeat now also allows the player to hold a button down, instead of pressing it repeatedly, in order to move the cursor multiple spaces.

You may remember that I left of my last blog post noting that there wasn’t much left for the game until I could find a way to slim down the code to fit new things. So what allowed these new features to fit?

Well, I did find ways to slim down the code: I was right about making the game state global. But, I also re-learned a lesson that is at the core of hacking: check your base assumptions before fiddling with unknowns. In this case, my base assumption was the Makefile I imported from an earlier NerdKits project. While making the game state global saved a little better than 1k of space, changing the Makefile such that unused debugging utilities, such as uart, printf, scanf weren’t linked in saved about 6k.

In that learning, I also found that attempting to out-guess gcc’s “space” optimization is a losing game. Making the game state global had a positive effect on space, but making the button state global had a negative effect. Changing integer types would help in one place, but hurt in others. I’m not intimately familiar with the rules of that optimizer, so it felt like spining a wheel of chance choosing which thing to prod next.

You may notice that I ultimately returned the game state to a local variable, passed in and out of each function that needed it. The reason for this was testability. It’s simply easier to test something that doesn’t depend on global state. Once I had a bug that required running a few specific game states through these functions repeatedly, it just made sense to pay the price in program space in order to be able to write unit tests to cover some behaviors.

So now what’s next? This time, it’s not much until I buy a new battery. So much reloading and testing finally drained the original 9V. Once power is restored, I’ll probably dig into some new peripheral … maybe something USB?

NerdKit Gaming

Contrary to the evidence on this blog, not all of the code I write is in Erlang. It’s not even all web-based or dealing with distributed systems. In fact, this week I spent my evenings writing C for an embedded device.

I’ve mentioned NerdKits here before (affiliate link). This week I finally dug into the kit I ordered so long ago, and took it somewhere: gaming.

The result is a clone of a simple tile-swap matching game. I used very little interesting hardware outside the microcontroller and LCD — mostly just a pile of buttons. The purpose of this experiment was to test the capabilities of the little ATmega168 (and my abilities to program it).

I’ve put the code on github, if you’re interested in browsing. If you don’t have a NerdKit of your own to load it up on, I’ve also made a short demo video, and snapped a few up-close screenshots.

What did I learn? Mostly I remembered that writing a bunch of code to operate on a small amount of data can be just as fun as writing a bunch of code to operate on a large amount of data. Lots of interaction with the same few bytes from different angles has a different feel than the same operation repeated time and time again on lots of different data. I also learned that I’ve been spoiled by interactive consoles and fast compile/reload times. When it takes a minute or more to restart (after power cycles and connector un-re-plugging) and I don’t have an effectively infinite buffer to dump logs in, I think a little longer about each experiment.

So what’s next? Well, not much for this game, unless I slim down the code some more. Right now it compiles to 14310 bytes. Shortly before this, it was 38 bytes larger, and refused to load onto the microcontroller properly, since it plus the bootloader exceeds the 16K of flash memory available. My first attack would probably be to simply move the game board to a global variable instead of passing it as a function argument. The savings in stack-pushing should gain a little room.

If I were to make room for new operations, then a feature that saved a bit of state across power cycles would be a fun target. What’s a game without a high-score board?

Baseball + Riak Map/Reduce, Round 2

If you enjoyed my last post on the Basho Blog, about computing baseball stats using Riak’s map/reduce, you may also enjoy my followup post about dealing with (or avoiding) records that have been split across block boundaries.

Baseball + Riak Map/Reduce

The luwak_mr tool I wrote about last weekend kept my imagination chugging this week. The result? I’ve learned a bit about baseball, and written code to compute batting average using luwak_mr (and Riak map/reduce, of course).

Learn how it works on the Basho Blog.

Map/reducing Luwak

I was inspired, this weekend, by off-list discussion of Luwak and by Guy Steele’s talk How to Think about Parallel Programming—Not!. The two seemed naturally attracted, and thus I created the luwak_mr module.

The luwak_mr module exposes simple a function that knows how to walk a Luwak file tree, and send the keys for each of its leaf nodes off to a Riak map/reduce process. This enables one to run a map function against each block in a luwak file. For example, one might split a large Latin-1 file into “words” (the luwak_mr_words module in the project is an example implementation of the method that Guy Steele presented).

And, yes, this blog has been dormant for a while … I’ve been busy. Lots of woodworking and travel. Making music has also begun to require more time, and yesterday I learned how to ski cross-country. Always busy, the life of a hobbyist.

Update: luwak_mr has also been accepted to the Riak function contrib. So if you’re in the habit of browsing there, fetch the latest.