Baseball + Riak Map/Reduce

The luwak_mr tool I wrote about last weekend kept my imagination chugging this week. The result? I’ve learned a bit about baseball, and written code to compute batting average using luwak_mr (and Riak map/reduce, of course).

Learn how it works on the Basho Blog.

Map/reducing Luwak

I was inspired, this weekend, by off-list discussion of Luwak and by Guy Steele’s talk How to Think about Parallel Programming—Not!. The two seemed naturally attracted, and thus I created the luwak_mr module.

The luwak_mr module exposes simple a function that knows how to walk a Luwak file tree, and send the keys for each of its leaf nodes off to a Riak map/reduce process. This enables one to run a map function against each block in a luwak file. For example, one might split a large Latin-1 file into “words” (the luwak_mr_words module in the project is an example implementation of the method that Guy Steele presented).

And, yes, this blog has been dormant for a while … I’ve been busy. Lots of woodworking and travel. Making music has also begun to require more time, and yesterday I learned how to ski cross-country. Always busy, the life of a hobbyist.

Update: luwak_mr has also been accepted to the Riak function contrib. So if you’re in the habit of browsing there, fetch the latest.

I’m Sorry (Maybe)

Why is it that embarrassing code has a way of sticking around? The specific variety of embarrassment doesn’t seem to matter (it could be hard to read, willfully inefficient, or just quirkily broken); all varieties live on equally well. Is it just that all code has a way of sticking around, and that we notice the embarrassing code more? Or is it that the embarrassing code is more likely to be written in those tough little corners that no one wanted to touch anyway, and still don’t want to touch now? I don’t know, but I do know every one of us has a few bits that we’d love to do over, if we could ever get the time to Do It Right.

I’m reminded of one of my most embarrassing bits every time I’m put on hold. The music comes on, I hear about three words, and then static. A couple of chords, more static. On and on.

The story of my embarrassment begins over ten years ago. The summer of 1999, I was interning at Lucent Technologies. It was my third summer there, and I was finally hacking on a product, not academic research (or IT upgrades, as my first summer had entailed).

The product was called Softswitch – an amazing new product in the early days of commercial IP telephony. The stack was some mix of C and Java, and there was a box humming somewhere with a connection to some corner of the phone system (at the very least the in-house ISDN). Interacting with telephones, over the internet, with software running on any old random box – wow![1]

My main task was helping to flesh out the add-on module system. “Flesh out” may be the wrong term. The goal of my work was more to experiment with the extension API they had created (known as the Programmable Feature Server), and to produce a demonstration of its capabilities, as well as to provide feedback about what was missing, rough, broken, etc. In the Web 2.0 world, I’d probably have been labeled “beta tester”.

Like most betas, the documentation was scarce. The rumor was that the lack of documentation was less intimidating for those that knew SS7 inside and out, but there was no way I was going to swallow that heap, and also produce something useful in three months.[2]

By late summer, I had implemented a fairly involved demo, boringly named ReminderCall. Dial in from any phone, navigate your way through Push-N-for-X menus, then eventually enter a time and record a message. At the time you chose, ReminderCall would dial your phone and play your message back to you. There was also a “web” frontend (either a Java servlet rendering HTML or a servlet talking to an applet; can’t remember which) for doing the same, as well as canceling or rescheduling pending reminders, if I recall correctly.

ReminderCall was a success. They liked it so much, they used it to demo Softswitch’s extensibility to MCI.

But it’s not the success that I intended to talk about here. The embarrassing code happened along the way to ReminderCall.

As a way to learn how to deal with audio streams, I first implemented another application with a somewhat smaller scope. Much like beginning to learn any display-based system by printing, “Hello World,” I began to learn this audio-based system by playing, “hello.” A few more hours of tinkering after that, and my application could also read key presses.

Polish things up a bit, and the first app I had ready was MusicOnHold. Being 17 at the time, all geek and zero taste, my demonstration music was none other than Sabotage by the Beastie Boys (light defense: it also happened to be one of the few songs I could find for download at the time, avoiding the lack of audio hardware in my workstation).

The nice thing about Sabotage is that it sounds like noise normally. Piping it over 8-bit (or less?) mono mainly just seems to change the timbre of the noise. It wasn’t until the boss asked me to find something more suitable for business-audience demonstration that it became apparent that the noise was part of the application. Glenn Miller’s In the Mood sounded better on every warped vinyl it ever graced. Dee-da-da-dee-kxhxhxhxhxhxhxhx-dee-dee-khxhxhxhxh.

There was worry, and hand wringing. Email went back and forth between us and the core Softswitch developers. Was it just Java unable to keep up (this was the 1.1 or 1.2 days, and I was still a n00b, after all)? Was it the interface to the switch? The network between the boxes? It’s true that the human voice requires less bandwidth to encode than something wide-frequency, high-dynamic-range like Big Band music, but I nevertheless tried re-encoding that song every which way. Things improved a bit, but still the static remained.

In the end, it was deemed more useful for me to press on and experiment with other features of the system, rather than muck about with this encoding trouble.

But there lies the perfect storm: an app no one really wanted to write, with a problem no one really wanted to touch, no one with the time to fix it anyway, and a flaw just embarrassing enough for me to remember it years later.

And now, every time I’m stuck on hold with static-filled music, I wonder whether someone just went ahead and packaged that MusicOnHold demo app with the Softswitch, and thereby forced my old, embarrasing code public. If that’s the case, then, I’m sorry, so kxhxhxhxhxhxhxhx.

[1] I used the department’s mail server for a time, to the chagrin of not only the admin, but another user trying to use that server to host their Netscape Navigator process.

[2] My mentor was also probing the reaches of the API, implementing the required wiretapping features, as I recall. She also gets credit for being the first person to introduce me to Emacs and OOP (by way of Java), not to mention a host of other enlightenment. Many thanks if you’re reading this by some chance!

London Erlang User Group: Riak Introduction

You announce that you’re visiting a foreign city, and suddenly your schedule is full of things to do there. 😉

The latest addition to my schedule is the London Erlang User Group. I’ll be giving an introduction to Riak, and also discussing the advantages and disadvantages we’ve experienced by choosing to develop in Erlang/OTP. If you’re interested, go register and then attend the meeting on April 21.

Now if I can just get some time for pints worked into that schedule as well, I’ll be set. I’m very anxious for some bitter/porter/etc. in the land of CAMRA. 🙂

NoSQL EU: Key-value Stores and Riak

I’m very excited to announce that I’ll be speaking at no:sql(eu). I’ll be covering Key-value stores and Riak. The talk should be a good overview of this [very] broad domain of datastores, as well as a closer look at a few unique features of some specific implementations.

I’ll also be teaching a Riak workshop on the last day of the conference. I plan to cover the design, implementation, and deployment of a simple wiki-like application. It should be a good introduction to simple Riak usage (just storing and fetching data), while also exposing some advanced features (like link-walking, map-reduce, and conflict-resolution).

Looking forward to meeting people there!

Vector Clocks on The Basho Blog

As penance for last night’s blathering about shiny new gadgets, I’ve written a post with much more interesting content for The Basho Blog. If you’re picking up Riak and curious about what these “vector clocks” are, and how to use them, have a look.

Padding Quietly Down the Hall

I haven’t posted here in a long time. I’ve wanted to. I have several posts partly written, just waiting on getting the last bits of example nailed down. But, as you can see, I haven’t finished the polishing I feel is necessary before posting them.

So, in an effort to jump-start my return to regular blogging, I’m going to do what everyone else is doing: I’m going to yammer about the iPad for a few paragraphs.

I am not, however, going to make some drooling prediction about it changing the world. I am also not going to make some frothing statement about how clueless Apple was to leave out my dream feature. I am not even going to pontificate about whether or not I’ll be buying one (or who else I think should or should not buy one).

Instead, I’d like to point out two things about the iPad that, I feel, have been underconsidered. Those two things are price and file-sharing.


$499. Five hundred dollars less than what seemed to be the most popular pre-announcement prediction. This is amazing because it hits many sweet spots.

Five hundred dollars is basically Geek Toy money. No, it’s not impulse-buy, “I tossed it in my cart to get free shipping at Amazon,” money. But, for your typical, gadget-loving geek, ~$2^9 is, “Yeah, I was thinking about sampling the market anyway.” That means it’s going to have myriad creative eyes and brains contemplating all sorts of mixed-up, new, different uses from day one. By day thirty, I guarantee you will see a demo of something surprising.

Woah. Almost drooled a bit there. Calming down now.

Five hundred dollars, or more specifically sub-1k, is also the price that pundits have been demanding from Apple. There’s always been the Mac mini, but that’s not portable, and its sub-$1000 price really depends on you already owning a keyboard, mouse, and monitor (or doing your own bargain hunting). The iPad now opens the doors to people who want a real Apple computer for half the price of a MacBook (ignoring resale and discount programs).


Yes, a real computer. How can I say this with a straight face? It’s all because of a new feature in the SDK: file-sharing.

The iPhone platform, until now, has not been designed for content creation. Consume all you want, but only produce short emails and the occasional snapshot. This means that it wasn’t a problem that there was no good way to transfer the content you produced onto and off of the device. The small bits produced went out in email. The things consumed, weren’t edited. (A few apps, like the excellent GoodReader, built in HTTP servers, just to get around this trouble.)

The iPad, though, has space and even dedicated hardware to provide UI for content creation. Indeed, Apple has ported the entire iWork suite to the device! In order for this to be of any use to anyone, though, there also needed to be a way to transfer that content elsewhere. Luckily, there is, in the form of a folder that each application can expose, which shows up as a directory on a drive when the iPad is plugged into a computer, just like any other USB drive. There is even a facility for asking other applications on the device to open files from the shared directory. No more bouncing things through remote network machines, just to get data moved between apps, or between the iPad and a desktop.

So that’s it. It’s cheap, and it can do more stuff. In a few months, I expect to see something really interesting.


It’s my birthday, and I’ve decided to give you all a present.

I needed a break from Riak & BeerRiot last week, and the thing everyone was talking about was iPhone webapps.

I read up and spent a while considering different apps I could build. Then it finally hit me: converting Dashboard widgets to iPhone webapps should be trivial!

Unfortunately, it’s not completely trivial if your widget uses the widget preferences, buttons, and/or back-face configuration stuff. But, it’s not impossible to emulate all that.

Lucky for me, I had a widget lying around that I’d written a few years ago. After a few hours of trial-and-error, I’m finally happy with it, and I’ve decided to release it to the world.

I give you the Spinner iPhone webapp. If you ever find yourself in a moment of indecision, tell the Spinner what your choices are and give it a flick.

If you’re without iPhone or iPod Touch, but you have a Mac running a recent OS X, I’ve also released the Dashboard widget, for you to relinquish the same decisive responsibility on the desktop.


This also gave me an excuse to upgrade the Webmachine installation on BeerRiot. Virtual hosts for the win!

Riak Screencast with Ben Ahlan

If you’ve been thinking of trying Riak, but hadn’t yet gotten around to downloading and experimenting, there’s one more resource available this morning that might tip you over the edge. Ben Ahlan and I recorded a screencast demonstrating basic setup and usage of Riak. If you can stand watching two guys mumble over a console for 40 minutes, you may find a tip or two to make your experimentation go smoothly. 😉

I also hear that Martin Scholl gave an awesome talk at NoSQL Berlin. I can’t wait to see the video.

Riak Presented at NYC NoSQL – slides, text & video

I had the pleasure of attending the NYC NoSQL Fall ’09 Meetup/Mini-Conference last Monday. Great talks, all around. I thought it was a good mix of use-case analysis and technology introduction.

In addition to enjoying everyone else’s presentations, I also presented Riak. It was a quick 12-minute talk, followed by 2.5 minutes of questions, but the response I got was great. People really dug in and had interesting observations and questions to discuss afterward.

If you weren’t able to make the event, Brendan has posted video of my talk. I have also posted an HTML slides-and-text version of my talk, if you prefer reading over watching and listening.