NoSQL EU: Key-value Stores and Riak

I’m very excited to announce that I’ll be speaking at no:sql(eu). I’ll be covering Key-value stores and Riak. The talk should be a good overview of this [very] broad domain of datastores, as well as a closer look at a few unique features of some specific implementations.

I’ll also be teaching a Riak workshop on the last day of the conference. I plan to cover the design, implementation, and deployment of a simple wiki-like application. It should be a good introduction to simple Riak usage (just storing and fetching data), while also exposing some advanced features (like link-walking, map-reduce, and conflict-resolution).

Looking forward to meeting people there!

Vector Clocks on The Basho Blog

As penance for last night’s blathering about shiny new gadgets, I’ve written a post with much more interesting content for The Basho Blog. If you’re picking up Riak and curious about what these “vector clocks” are, and how to use them, have a look.

Padding Quietly Down the Hall

I haven’t posted here in a long time. I’ve wanted to. I have several posts partly written, just waiting on getting the last bits of example nailed down. But, as you can see, I haven’t finished the polishing I feel is necessary before posting them.

So, in an effort to jump-start my return to regular blogging, I’m going to do what everyone else is doing: I’m going to yammer about the iPad for a few paragraphs.

I am not, however, going to make some drooling prediction about it changing the world. I am also not going to make some frothing statement about how clueless Apple was to leave out my dream feature. I am not even going to pontificate about whether or not I’ll be buying one (or who else I think should or should not buy one).

Instead, I’d like to point out two things about the iPad that, I feel, have been underconsidered. Those two things are price and file-sharing.

Price

$499. Five hundred dollars less than what seemed to be the most popular pre-announcement prediction. This is amazing because it hits many sweet spots.

Five hundred dollars is basically Geek Toy money. No, it’s not impulse-buy, “I tossed it in my cart to get free shipping at Amazon,” money. But, for your typical, gadget-loving geek, ~$2^9 is, “Yeah, I was thinking about sampling the market anyway.” That means it’s going to have myriad creative eyes and brains contemplating all sorts of mixed-up, new, different uses from day one. By day thirty, I guarantee you will see a demo of something surprising.

Woah. Almost drooled a bit there. Calming down now.

Five hundred dollars, or more specifically sub-1k, is also the price that pundits have been demanding from Apple. There’s always been the Mac mini, but that’s not portable, and its sub-$1000 price really depends on you already owning a keyboard, mouse, and monitor (or doing your own bargain hunting). The iPad now opens the doors to people who want a real Apple computer for half the price of a MacBook (ignoring resale and discount programs).

File-sharing

Yes, a real computer. How can I say this with a straight face? It’s all because of a new feature in the SDK: file-sharing.

The iPhone platform, until now, has not been designed for content creation. Consume all you want, but only produce short emails and the occasional snapshot. This means that it wasn’t a problem that there was no good way to transfer the content you produced onto and off of the device. The small bits produced went out in email. The things consumed, weren’t edited. (A few apps, like the excellent GoodReader, built in HTTP servers, just to get around this trouble.)

The iPad, though, has space and even dedicated hardware to provide UI for content creation. Indeed, Apple has ported the entire iWork suite to the device! In order for this to be of any use to anyone, though, there also needed to be a way to transfer that content elsewhere. Luckily, there is, in the form of a folder that each application can expose, which shows up as a directory on a drive when the iPad is plugged into a computer, just like any other USB drive. There is even a facility for asking other applications on the device to open files from the shared directory. No more bouncing things through remote network machines, just to get data moved between apps, or between the iPad and a desktop.

So that’s it. It’s cheap, and it can do more stuff. In a few months, I expect to see something really interesting.

Spinner!

It’s my birthday, and I’ve decided to give you all a present.

I needed a break from Riak & BeerRiot last week, and the thing everyone was talking about was iPhone webapps.

I read up and spent a while considering different apps I could build. Then it finally hit me: converting Dashboard widgets to iPhone webapps should be trivial!

Unfortunately, it’s not completely trivial if your widget uses the widget preferences, buttons, and/or back-face configuration stuff. But, it’s not impossible to emulate all that.

Lucky for me, I had a widget lying around that I’d written a few years ago. After a few hours of trial-and-error, I’m finally happy with it, and I’ve decided to release it to the world.

I give you the Spinner iPhone webapp. If you ever find yourself in a moment of indecision, tell the Spinner what your choices are and give it a flick.

If you’re without iPhone or iPod Touch, but you have a Mac running a recent OS X, I’ve also released the Dashboard widget, for you to relinquish the same decisive responsibility on the desktop.

Enjoy!

This also gave me an excuse to upgrade the Webmachine installation on BeerRiot. Virtual hosts for the win!

Riak Screencast with Ben Ahlan

If you’ve been thinking of trying Riak, but hadn’t yet gotten around to downloading and experimenting, there’s one more resource available this morning that might tip you over the edge. Ben Ahlan and I recorded a screencast demonstrating basic setup and usage of Riak. If you can stand watching two guys mumble over a console for 40 minutes, you may find a tip or two to make your experimentation go smoothly. 😉

I also hear that Martin Scholl gave an awesome talk at NoSQL Berlin. I can’t wait to see the video.

Riak Presented at NYC NoSQL – slides, text & video

I had the pleasure of attending the NYC NoSQL Fall ’09 Meetup/Mini-Conference last Monday. Great talks, all around. I thought it was a good mix of use-case analysis and technology introduction.

In addition to enjoying everyone else’s presentations, I also presented Riak. It was a quick 12-minute talk, followed by 2.5 minutes of questions, but the response I got was great. People really dug in and had interesting observations and questions to discuss afterward.

If you weren’t able to make the event, Brendan has posted video of my talk. I have also posted an HTML slides-and-text version of my talk, if you prefer reading over watching and listening.

Riak Demo: Stickynotes

Warning: This blog post is woefully out of date:

  1. Instead of building Riak from source, you should download a pre-compiled release from downloads.basho.com (though the source is still available, if you want).
  2. Starting Riak is now done with the bin/riak start command, not start-fresh.sh Stopping is also easily done with bin/riak stop
  3. Jiak is no longer the preferred HTTP interface. There is now one that accepts any content type, not just JSON.
  4. Without Jiak, back-linking of notes will need to be done with post-commit hooks.

There are probably other things I’ve overlooked. Really, you should just head over to the Riak wiki to get the intro.


Basho released Riak to the world just over a week ago. Some think that the docs are good enough that further explanation is not necessary. But, if you’re looking for a little more introduction, read on.

The short version of this post: Riak is as simple as downloading it and hitting the HTTP interface:

1$ hg clone http://bitbucket.org/basho/riak/
2$ cd riak
3$ make
4$ ./start-fresh.sh config/riak.erlenv

5$ curl -X PUT -H "Content-type: application/json" \
5>   http://localhost:8098/jiak/foo \
5>   -d "{\"schema\":{\"allowed_fields\":[\"bar\"],\"required_fields\":[\"bar\"],\"read_mask\":[\"bar\"],\"write_mask\":[\"bar\"]}}"

6$ curl -X PUT -H "Content-type: application/json" \
6>   http://localhost:8098/jiak/foo/baz \
6>   -d "{\"bucket\":\"foo\",\"key\":\"baz\",\"object\":{\"bar\":\"Hello World\"},\"links\":[]}"

7$ curl http://localhost:8098/jiak/foo
{"schema":{"allowed_fields":["bar"],"required_fields":["bar"],"read_mask":["bar"],"write_mask":["bar"]},"keys":["baz"]}

8$ curl http://localhost:8098/jiak/foo/baz
{"object":{"bar":"Hello World"},"vclock":"MzIwsDSwMDQyMjAyNjXQLcpMzHYwNDLXMwBCQ3SusYGZoZGxvqG+mbGJobmxkbmFsQEA","lastmod":"Wed, 12 Aug 2009 20:23:50 GMT","vtag":"6DxaqiRCDBevf03tzPZpzl","bucket":"foo","key":"baz","links":[]}

Commands 1-4 download and start Riak. Command 5 creates the bucket foo, and tells Jiak, Riak’s HTTP interface, that objects in bucket foo must have a bar field that is readable and writable. Command 6 creates the object baz in bucket foo. Command 7 gets the listing of bucket foo. Command 8 retrieves object baz in bucket foo.

Aside: Some have found it difficult to stop Riak, so I’ll throw the tip out here: killall heart. You may also have to kill the erl command afterward, but if you don’t kill heart first, Riak will just come right back.

That’s all it takes to use Riak. You can stop reading now and happily speak REST to it for the rest of your application’s lifetime.

If you’re still hanging around, though, maybe you’d be interested in a few more features that Jiak has to offer, like link-walking and field validation. To demonstrate those, I’ll spend the rest of this post describing the creation of an application on Riak.

Let’s say that I wanted to create a note-taking system, something like the fine Erlang/Mochiweb demo put together by the guys at Beebole.

There will be two kinds of objects in my system: notes and groups. The properties of a note will be the text of the note, the color, the position on the screen, and the stacking order. The properties of a group will be the name of the group and the list of notes in the group.

I’ll start by whipping up two modules to manage those buckets for me. These modules tell jiak_resource how to validate the structure of objects in these buckets (jiak_resource is the Webmachine resource that runs Riak’s HTTP interface). They’ll follow the basic structure of jiak_example.erl, which ships with Riak.

-module(groups).
-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
         required_fields/0, read_mask/0, write_mask/0,
         expires_in_seconds/3, check_write/4, effect_write/4,
         after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields()  -> [<<"name">>].
required_fields() -> allowed_fields().
read_mask()       -> allowed_fields().
write_mask()      -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
    {600, ReqData, Context}.

check_write({_PutType, _Key}, JiakObject, ReqData, Context) ->
    {ObjDiffs,_} = Context:diff(),
    case lists:foldl(fun check_diff/2, [], ObjDiffs) of
        [] ->
            {{ok, JiakObject}, ReqData, Context};
        Errors ->
            {{error, list_to_binary(string:join(Errors, ", "))},
             ReqData, Context}
    end.

check_diff({<<"name">>, _, Value}, ErrorAcc) ->
    if is_binary(Value) -> ErrorAcc;
       true             -> ["name field must be a string"|ErrorAcc]
    end.

effect_write(_Key, JiakObject, ReqData, Context) ->
    {{ok, JiakObject}, ReqData, Context}.

after_write(_Key, _JiakObject, ReqData, Context) ->
    {ok, ReqData, Context}.

merge_siblings(Siblings) ->
    jiak:standard_sibling_merge(Siblings).

The groups module ensures that every group has a name field (through implementation of allowed_fields/0, required_fields/0, read_fields/0, and write_fields/0), and that the value of that field is a string (using check_write/4 and check_diff/2).

-module(notes).
-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
         required_fields/0, read_mask/0, write_mask/0,
         expires_in_seconds/3, check_write/4, effect_write/4,
         after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields() ->
    [<<"text">>, <<"x">>, <<"y">>, <<"z">>, <<"color">>].

required_fields() -> allowed_fields().
read_mask()       -> allowed_fields().
write_mask()      -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
    {600, ReqData, Context}.

check_write({_PutType, Key}, JiakObject, ReqData, Context) ->
    {ObjDiffs,_} = Context:diff(),
    case lists:foldl(fun check_diff/2, [], ObjDiffs) of
        [] ->
            {{ok, JiakObject}, ReqData, Context:set_prop(key, Key)};
        Errors ->
            {{error, list_to_binary(string:join(Errors, ", "))},
             ReqData, Context}
    end.

-define(COLORS, [<<"yellow">>, <<"pink">>, <<"green">>, <<"blue">>]).

check_diff({<<"text">>, _, Value}, ErrorAcc) ->
    if is_binary(Value) -> ErrorAcc;
       true             -> ["text field must be a string"|ErrorAcc]
    end;
check_diff({Coord, _, Value}, ErrorAcc)
  when Coord==<<"x">>;Coord==<<"y">>;Coord==<<"z">> ->
    if is_integer(Value) -> ErrorAcc;
       true ->
            [io_lib:format("~s field must be an integer", [Coord])
             |ErrorAcc]
    end;
check_diff({<<"color">>, _, Value}, ErrorAcc) ->
    case lists:member(Value, ?COLORS) of
        true -> ErrorAcc;
        false ->
            [io_lib:format("color field must be one of (~s)",
                           [string:join([binary_to_list(C)||C<-?COLORS],
                                        ",")])
             |ErrorAcc]
    end.

effect_write(_Key, JiakObject, ReqData, Context) ->
    {{ok, JiakObject}, ReqData, Context}.

after_write(_Key, JiakObject, ReqData, Context) ->
    spawn(fun() ->
                  [[_, GroupKey, _]] = jiak_object:links(JiakObject, groups),
                  {ok, C} = jiak:local_client(),
                  {ok, G} = C:get(groups, GroupKey, 2),
                  Key = Context:get_prop(key),
                  C:put(jiak_object:add_link(G, notes, Key, <<"note">>), 2)
          end),
    {ok, ReqData, Context}.

merge_siblings(Siblings) ->
    jiak:standard_sibling_merge(Siblings).

The notes module ensures that all notes have a text field that is a string; x, y, and z fields that are integers; and a color field that is one of a specific list of strings (using the same functions as the groups module used).

One more interesting thing happens in the notes module. If you look at after_write/4, you’ll see that it fetches the groups object that the note links to, and adds the note to that group’s links. Jiak calls after_write/4 after the note has been stored in Riak, so what I’ve written here is effectively an automatic back-link monitor. We’ll return to the concept of links in a moment.

If I put the notes and groups modules in place, I can fire up Riak and immediately being sending HTTP requests to its Jiak interface to create and modify groups and notes. For example:

$ curl -X PUT -H "Content-type: application/json" \
>   http://127.0.0.1:8098/jiak/groups/todos \
>   -d "{\"bucket\":\"groups\",\"key\":\"todos\",\"object\":{\"name\":\"todo\"}\"links\":[]}"

$ curl -X PUT -H "Content-type: application/json" \
>   http://127.0.0.1:8098/jiak/notes/blog \
>   -d "{\"bucket\":\"notes\",\"key\":\"blog\",\"object\":{\"text\":\"finish blog post\",\"x\":0,\"y\":0,\"z\":0,\"color\":\"green\"},\"links\":[[\"groups\",\"todos\",\"open\"]]"

These two lines would create a group named todo with a note labeled finish blog post.

Now, about those links. See the ["groups","todos","open"] item in the links field of that notes object? That’s a link to the groups object named todos, and I’ve tagged it open. The affect_write/4 function in the notes module will add a link to the groups object of the form ["notes","blog","note"].

What does this get me? More than just record-keeping: I can now use the Jiak utility jaywalker to get all of the notes in the todo group with a single query:

$ curl http://127.0.0.1:8098/jiak/groups/todos/notes,_,_

You’ll recognize the first part, through /jiak/groups/todos/. The segment after that is a link query. This ones says "all notes objects, with any tag." The links are structured as {bucket},{tag},{accumulate} segments, with underscore meaning "any."

The example query will return an object with a results field that is a list of lists of results. That is, if I were to store the data returned from that query in a variable called data, my list of notes would be at data.results[0].

I’m not limited to one hop, either. If there were also person objects in my system, linked from notes objects as authors I might get all of the authors of the notes in a group with:

$ curl http://127.0.0.1:8098/jiak/groups/todos/notes,_,_/person,author,_

More information on link-walking can be found in the jaywalker_resource documentation.

The curl commands and the HTTP requests they represent are pretty simple, but I’m going to be hitting these resources from Javascript running in a browser. So, I’ll wrap it all up in a nice utility class:

function JiakClient(BaseUrl, Opts) {
    this.baseurl = BaseUrl;
    if (!(this.baseurl.slice(-1) == '/'))
        this.baseurl += '/';

    this.opts = Opts||{};
}

JiakClient.prototype.store = function(Object, Callback, NoReturnBody) {
    var req = {
        contentType: "application/json",
        dataType: "json"
    };

    if (this.opts.alwaysPost || !Object.key)
        req.type = 'POST';
    else
        req.type = 'PUT';
    
    req.url = this.baseurl+Object.bucket+'/';
    if (Object.key) req.url += Object.key;
    
    if (!(this.opts.noReturnBody || NoReturnBody))
        req.url += '?returnbody=true';

    if (typeof Callback == 'function')
        req.success = Callback;

    req.data = JSON.stringify(Object);

    return $.ajax(req);
}

JiakClient.prototype.fetch = function(Bucket, Key, Callback) {
    return $.ajax({
        url:      this.baseurl+Bucket+'/'+Key,
        dataType: "json",
        success:  Callback
    });
}

JiakClient.prototype.remove = function(Bucket, Key, Callback) {
    return $.ajax({
        type:    'DELETE',
        url:     this.baseurl+Bucket+'/'+Key,
        success: Callback
    });
}

JiakClient.prototype.walk = function(Start, Spec, Callback) {
    var req = {
        dataType: "json",
        success: Callback
    };

    if ('bucket' in Start)
        req.url = this.baseurl+Start.bucket+'/'+Start.key+'/';
    else
        req.url = this.baseurl+Start[0]+'/'+Start[1]+'/';

    for (i in Spec) {
        req.url += (Spec[i].bucket||'_')+','+
            (Spec[i].tag||'_')+','+
            ((Spec[i].acc || i == Spec.length-1) ? '1' : '_')+'/';
    }

    return $.ajax(req);
}

Now I can create the same effect as the curl commands with this code:

var J = new JiakClient("/jiak/");
J.store({
  bucket:"groups",
  key:"todos"
  object:{"name":"todo"}
  links:[]
},
function(group) {
  J.store({
    bucket:"notes",
    key:"blog",
    object:{
    },
    links:[["groups",group.key,"open"]]
  });
});

With the request for "all notes in the todo group" looking like:

J.walk({bucket:"groups", key:"todos"},
       [{bucket:"notes"}],
       function(data) {
         var notes = data.results[0];
         //do stuff...
       });

Okay, now I have my backend and frontend – I’d better fill in the middle. Any intermediary webserver could work, but because Riak comes with Webmachine included, I’ll just setup a quick Webmachine app.

$ cd $RIAK_HOME
$ deps/webmachine/scripts/new_webmachine.erl stickynotes ..
$ cp config/riak.erlenv ../stickynotes/riak-config.erlenv

I copied the Riak config (riak.erlenv) because there are two customizations I want to add to it:

{add_paths, ["../stickynotes/ebin"]}.
{jiak_buckets, [notes, groups]}.

The former puts the stickynotes ebin in Riak’s code path, so Jiak can reach the notes and groups modules I just wrote. The latter does nothing but force the atoms 'notes' and 'groups' into the Riak node so Jiak can use list_to_existing_atom/1.

To my new Webmachine app, I’ll add four things:

  1. the notes and groups modules I just wrote
  2. the static files from Beebole’s example applcation (HTML, CSS, JS) … modified a bit to load and use the jiak.js utility from above
  3. a simple static resource server
  4. a simple proxy resource to pass requests through to jiak (the couchdb_proxy from my wmexamples repo will do fine)

Once I have everything aligned (including dispatch setup correctly), I just start Riak and my Webmachine node:

$ cd $RIAK_HOME
$ ./start-fresh.sh ../stickynotes/riak-config.erlenv
$ cd ../stickynotes
$ ./start.sh

…then point my browser at http://localhost:8000/, and I see the app’s UI with an empty group ready to store some notes. I recommend opening Firebug to watch the requests fly by.

I realize that I’ve glossed over a bit of the Webmachine application stuff, but that’s because it’s mostly rehash of older posts. The better way to cover all of that material is for me to tell you to open up the demo/stickynotes directory in the Riak repo you just cloned, and read the code written there yourself. 🙂

Webmachine Bloggy Goodness

A whole bunch of people (no half-bunches in these parts) are picking up Webmachine and writing about their experiences. If you’re looking for a few more people to follow:

Andy Gross (a coworker) is writing about file uploads. Actual webmachine details to follow in his next post – this first one is about the client side of the equation. Update: that second post is up, and includes an example of the new “streamed body” feature in webmachine 1.3.

Seth Falcon is covering the design and implementation of a URL-shortening service. Again, webmachine details to follow, but there’s some good talk about using Mnesia for storage in the current post.

Marc Worrell is laying down the outline of his new CMS, Zophrenic. This looks like a very ambitious project, and I’m anxious to see its release.

Update: More blogs from the comments

Daniel Kwiecinski just covered redirecting (3xx responses), and also recently showed off his own static-file-serving resource. He also tips his hat to wmtrace, so he gets brownie points. 😉

Great reads, but don’t take my word for it.</burton>

Webmachine POST Example

Many people have asked for an example Webmachine resource that responds to POST. If you follow my twitter feed, you may have caught this gem.

I figured that example could use a little fleshing out, so I’ve added a resource to my wmexamples repo.

formjson_resourc.erl makes an attempt at demonstrating the simplest way to handle a POST, while also demonstrating the difference between content-producing functions (to_json/2 in this example, and others named in content_types_provided/2), which put content in the response body simply by returning it, and other functions, which have to put content in the response body by returning a modified ReqData.

For another example of handling POST, read demo_fs_resource.erl that comes with Webmachine. It implements post_is_create/2, create_path/2, content_types_accepted/2, and accept_content/2 to handle POST requests. (Incidentally, demo_fs_resource is a good example of many Webmachine resource functions.)

Updated to include content_types_accepted/2 in the list of functions handling POST requests – thanks for catching it, Lou!

Webmachine Slideshow Intro

Justin Sheehy has just posted his video slideshow introduction to webmachine. The video contains the slides and most of the talk that Justin gave at the Bay Area Erlang Factory last month. If you’ve been thinking about checking out Webmachine, this half hour is well worth your time.

Webmachine has also gained several new features recently, including delayed body receive and a more independent request dispatcher. Partial receive and send is even available in tip.

Unfortunately, the trace utility required changes that broke compatiblity with old trace files. Support for those old traces could probably be hacked in without too much trouble if you need it (the cause was mainly just a change to a record structure). I assumed that trace files are somewhat ephemeral, though – really only needed for debugging, and then thrown away.

Finally, if you’re looking for example resources and dispatch tables, I recommend checking out my wmexamples repo on bitbucket. It has a smattering of working resources that can hopefully give you a feel for how we normally write them.