If you enjoyed my last post on the Basho Blog, about computing baseball stats using Riak’s map/reduce, you may also enjoy my followup post about dealing with (or avoiding) records that have been split across block boundaries.
I was inspired, this weekend, by off-list discussion of Luwak and by Guy Steele’s talk How to Think about Parallel Programming—Not!. The two seemed naturally attracted, and thus I created the luwak_mr module.
The luwak_mr module exposes simple a function that knows how to walk a Luwak file tree, and send the keys for each of its leaf nodes off to a Riak map/reduce process. This enables one to run a map function against each block in a luwak file. For example, one might split a large Latin-1 file into “words” (the luwak_mr_words module in the project is an example implementation of the method that Guy Steele presented).
And, yes, this blog has been dormant for a while … I’ve been busy. Lots of woodworking and travel. Making music has also begun to require more time, and yesterday I learned how to ski cross-country. Always busy, the life of a hobbyist.