## Archive for the ‘Riak’ Category

### Baseball + Riak Map/Reduce: the Movie

If you have been following my posts about using Riak’s map/reduce system to compute baseball statistics via the luwak_mr library, or if they’re still sitting in your ‘read later’ pile, you may be interested in a presentation I gave yesterday on the same topic.

Video of that presentation is available on Vimeo. It covers most of the content in the blog posts, while also providing a little extra background about why luwak_mr was necessary for the work.

### Baseball + Riak Map/Reduce, Round 2

If you enjoyed my last post on the Basho Blog, about computing baseball stats using Riak’s map/reduce, you may also enjoy my followup post about dealing with (or avoiding) records that have been split across block boundaries.

### Baseball + Riak Map/Reduce

The luwak_mr tool I wrote about last weekend kept my imagination chugging this week. The result? I’ve learned a bit about baseball, and written code to compute batting average using luwak_mr (and Riak map/reduce, of course).

Learn how it works on the Basho Blog.

### Map/reducing Luwak

I was inspired, this weekend, by off-list discussion of Luwak and by Guy Steele’s talk How to Think about Parallel Programming—Not!. The two seemed naturally attracted, and thus I created the luwak_mr module.

The luwak_mr module exposes simple a function that knows how to walk a Luwak file tree, and send the keys for each of its leaf nodes off to a Riak map/reduce process. This enables one to run a map function against each block in a luwak file. For example, one might split a large Latin-1 file into “words” (the luwak_mr_words module in the project is an example implementation of the method that Guy Steele presented).

And, yes, this blog has been dormant for a while … I’ve been busy. Lots of woodworking and travel. Making music has also begun to require more time, and yesterday I learned how to ski cross-country. Always busy, the life of a hobbyist.

Update: luwak_mr has also been accepted to the Riak function contrib. So if you’re in the habit of browsing there, fetch the latest.

### London Erlang User Group: Riak Introduction

You announce that you’re visiting a foreign city, and suddenly your schedule is full of things to do there.

The latest addition to my schedule is the London Erlang User Group. I’ll be giving an introduction to Riak, and also discussing the advantages and disadvantages we’ve experienced by choosing to develop in Erlang/OTP. If you’re interested, go register and then attend the meeting on April 21.

Now if I can just get some time for pints worked into that schedule as well, I’ll be set. I’m very anxious for some bitter/porter/etc. in the land of CAMRA.

### NoSQL EU: Key-value Stores and Riak

I’m very excited to announce that I’ll be speaking at no:sql(eu). I’ll be covering Key-value stores and Riak. The talk should be a good overview of this [very] broad domain of datastores, as well as a closer look at a few unique features of some specific implementations.

I’ll also be teaching a Riak workshop on the last day of the conference. I plan to cover the design, implementation, and deployment of a simple wiki-like application. It should be a good introduction to simple Riak usage (just storing and fetching data), while also exposing some advanced features (like link-walking, map-reduce, and conflict-resolution).

Looking forward to meeting people there!

### Vector Clocks on The Basho Blog

As penance for last night’s blathering about shiny new gadgets, I’ve written a post with much more interesting content for The Basho Blog. If you’re picking up Riak and curious about what these “vector clocks” are, and how to use them, have a look.

### Riak Screencast with Ben Ahlan

If you’ve been thinking of trying Riak, but hadn’t yet gotten around to downloading and experimenting, there’s one more resource available this morning that might tip you over the edge. Ben Ahlan and I recorded a screencast demonstrating basic setup and usage of Riak. If you can stand watching two guys mumble over a console for 40 minutes, you may find a tip or two to make your experimentation go smoothly.

I also hear that Martin Scholl gave an awesome talk at NoSQL Berlin. I can’t wait to see the video.

### Riak Presented at NYC NoSQL – slides, text & video

I had the pleasure of attending the NYC NoSQL Fall ’09 Meetup/Mini-Conference last Monday. Great talks, all around. I thought it was a good mix of use-case analysis and technology introduction.

In addition to enjoying everyone else’s presentations, I also presented Riak. It was a quick 12-minute talk, followed by 2.5 minutes of questions, but the response I got was great. People really dug in and had interesting observations and questions to discuss afterward.

If you weren’t able to make the event, Brendan has posted video of my talk. I have also posted an HTML slides-and-text version of my talk, if you prefer reading over watching and listening.

### Riak Demo: Stickynotes

Warning: This blog post is woefully out of date:

2. Starting Riak is now done with the bin/riak start command, not start-fresh.sh Stopping is also easily done with bin/riak stop
3. Jiak is no longer the preferred HTTP interface. There is now one that accepts any content type, not just JSON.
4. Without Jiak, back-linking of notes will need to be done with post-commit hooks.

There are probably other things I’ve overlooked. Really, you should just head over to the Riak wiki to get the intro.

Basho released Riak to the world just over a week ago. Some think that the docs are good enough that further explanation is not necessary. But, if you’re looking for a little more introduction, read on.

The short version of this post: Riak is as simple as downloading it and hitting the HTTP interface:

```1\$ hg clone http://bitbucket.org/basho/riak/
2\$ cd riak
3\$ make
4\$ ./start-fresh.sh config/riak.erlenv

5\$ curl -X PUT -H "Content-type: application/json" \
5>   http://localhost:8098/jiak/foo \

6\$ curl -X PUT -H "Content-type: application/json" \
6>   http://localhost:8098/jiak/foo/baz \

7\$ curl http://localhost:8098/jiak/foo

8\$ curl http://localhost:8098/jiak/foo/baz
{"object":{"bar":"Hello World"},"vclock":"MzIwsDSwMDQyMjAyNjXQLcpMzHYwNDLXMwBCQ3SusYGZoZGxvqG+mbGJobmxkbmFsQEA","lastmod":"Wed, 12 Aug 2009 20:23:50 GMT","vtag":"6DxaqiRCDBevf03tzPZpzl","bucket":"foo","key":"baz","links":[]}
```

Commands 1-4 download and start Riak. Command 5 creates the bucket foo, and tells Jiak, Riak’s HTTP interface, that objects in bucket foo must have a bar field that is readable and writable. Command 6 creates the object baz in bucket foo. Command 7 gets the listing of bucket foo. Command 8 retrieves object baz in bucket foo.

Aside: Some have found it difficult to stop Riak, so I’ll throw the tip out here: killall heart. You may also have to kill the erl command afterward, but if you don’t kill heart first, Riak will just come right back.

That’s all it takes to use Riak. You can stop reading now and happily speak REST to it for the rest of your application’s lifetime.

If you’re still hanging around, though, maybe you’d be interested in a few more features that Jiak has to offer, like link-walking and field validation. To demonstrate those, I’ll spend the rest of this post describing the creation of an application on Riak.

Let’s say that I wanted to create a note-taking system, something like the fine Erlang/Mochiweb demo put together by the guys at Beebole.

There will be two kinds of objects in my system: notes and groups. The properties of a note will be the text of the note, the color, the position on the screen, and the stacking order. The properties of a group will be the name of the group and the list of notes in the group.

I’ll start by whipping up two modules to manage those buckets for me. These modules tell jiak_resource how to validate the structure of objects in these buckets (jiak_resource is the Webmachine resource that runs Riak’s HTTP interface). They’ll follow the basic structure of jiak_example.erl, which ships with Riak.

```-module(groups).
-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
expires_in_seconds/3, check_write/4, effect_write/4,
after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields()  -> [<<"name">>].
required_fields() -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
{600, ReqData, Context}.

check_write({_PutType, _Key}, JiakObject, ReqData, Context) ->
{ObjDiffs,_} = Context:diff(),
case lists:foldl(fun check_diff/2, [], ObjDiffs) of
[] ->
{{ok, JiakObject}, ReqData, Context};
Errors ->
{{error, list_to_binary(string:join(Errors, ", "))},
ReqData, Context}
end.

check_diff({<<"name">>, _, Value}, ErrorAcc) ->
if is_binary(Value) -> ErrorAcc;
true             -> ["name field must be a string"|ErrorAcc]
end.

effect_write(_Key, JiakObject, ReqData, Context) ->
{{ok, JiakObject}, ReqData, Context}.

after_write(_Key, _JiakObject, ReqData, Context) ->
{ok, ReqData, Context}.

merge_siblings(Siblings) ->
jiak:standard_sibling_merge(Siblings).
```

The groups module ensures that every group has a name field (through implementation of allowed_fields/0, required_fields/0, read_fields/0, and write_fields/0), and that the value of that field is a string (using check_write/4 and check_diff/2).

```-module(notes).
-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
expires_in_seconds/3, check_write/4, effect_write/4,
after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields() ->
[<<"text">>, <<"x">>, <<"y">>, <<"z">>, <<"color">>].

required_fields() -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
{600, ReqData, Context}.

check_write({_PutType, Key}, JiakObject, ReqData, Context) ->
{ObjDiffs,_} = Context:diff(),
case lists:foldl(fun check_diff/2, [], ObjDiffs) of
[] ->
{{ok, JiakObject}, ReqData, Context:set_prop(key, Key)};
Errors ->
{{error, list_to_binary(string:join(Errors, ", "))},
ReqData, Context}
end.

-define(COLORS, [<<"yellow">>, <<"pink">>, <<"green">>, <<"blue">>]).

check_diff({<<"text">>, _, Value}, ErrorAcc) ->
if is_binary(Value) -> ErrorAcc;
true             -> ["text field must be a string"|ErrorAcc]
end;
check_diff({Coord, _, Value}, ErrorAcc)
when Coord==<<"x">>;Coord==<<"y">>;Coord==<<"z">> ->
if is_integer(Value) -> ErrorAcc;
true ->
[io_lib:format("~s field must be an integer", [Coord])
|ErrorAcc]
end;
check_diff({<<"color">>, _, Value}, ErrorAcc) ->
case lists:member(Value, ?COLORS) of
true -> ErrorAcc;
false ->
[io_lib:format("color field must be one of (~s)",
[string:join([binary_to_list(C)||C<-?COLORS],
",")])
|ErrorAcc]
end.

effect_write(_Key, JiakObject, ReqData, Context) ->
{{ok, JiakObject}, ReqData, Context}.

after_write(_Key, JiakObject, ReqData, Context) ->
spawn(fun() ->
[[_, GroupKey, _]] = jiak_object:links(JiakObject, groups),
{ok, C} = jiak:local_client(),
{ok, G} = C:get(groups, GroupKey, 2),
Key = Context:get_prop(key),
end),
{ok, ReqData, Context}.

merge_siblings(Siblings) ->
jiak:standard_sibling_merge(Siblings).
```

The notes module ensures that all notes have a text field that is a string; x, y, and z fields that are integers; and a color field that is one of a specific list of strings (using the same functions as the groups module used).

One more interesting thing happens in the notes module. If you look at after_write/4, you’ll see that it fetches the groups object that the note links to, and adds the note to that group’s links. Jiak calls after_write/4 after the note has been stored in Riak, so what I’ve written here is effectively an automatic back-link monitor. We’ll return to the concept of links in a moment.

If I put the notes and groups modules in place, I can fire up Riak and immediately being sending HTTP requests to its Jiak interface to create and modify groups and notes. For example:

```\$ curl -X PUT -H "Content-type: application/json" \
>   http://127.0.0.1:8098/jiak/groups/todos \

\$ curl -X PUT -H "Content-type: application/json" \
>   http://127.0.0.1:8098/jiak/notes/blog \
```

These two lines would create a group named todo with a note labeled finish blog post.

Now, about those links. See the ["groups","todos","open"] item in the links field of that notes object? That’s a link to the groups object named todos, and I’ve tagged it open. The affect_write/4 function in the notes module will add a link to the groups object of the form ["notes","blog","note"].

What does this get me? More than just record-keeping: I can now use the Jiak utility jaywalker to get all of the notes in the todo group with a single query:

```\$ curl http://127.0.0.1:8098/jiak/groups/todos/notes,_,_
```

You’ll recognize the first part, through /jiak/groups/todos/. The segment after that is a link query. This ones says "all notes objects, with any tag." The links are structured as {bucket},{tag},{accumulate} segments, with underscore meaning "any."

The example query will return an object with a results field that is a list of lists of results. That is, if I were to store the data returned from that query in a variable called data, my list of notes would be at data.results[0].

I’m not limited to one hop, either. If there were also person objects in my system, linked from notes objects as authors I might get all of the authors of the notes in a group with:

```\$ curl http://127.0.0.1:8098/jiak/groups/todos/notes,_,_/person,author,_
```

The curl commands and the HTTP requests they represent are pretty simple, but I’m going to be hitting these resources from Javascript running in a browser. So, I’ll wrap it all up in a nice utility class:

```function JiakClient(BaseUrl, Opts) {
this.baseurl = BaseUrl;
if (!(this.baseurl.slice(-1) == '/'))
this.baseurl += '/';

this.opts = Opts||{};
}

JiakClient.prototype.store = function(Object, Callback, NoReturnBody) {
var req = {
contentType: "application/json",
dataType: "json"
};

if (this.opts.alwaysPost || !Object.key)
req.type = 'POST';
else
req.type = 'PUT';

req.url = this.baseurl+Object.bucket+'/';
if (Object.key) req.url += Object.key;

if (!(this.opts.noReturnBody || NoReturnBody))
req.url += '?returnbody=true';

if (typeof Callback == 'function')
req.success = Callback;

req.data = JSON.stringify(Object);

return \$.ajax(req);
}

JiakClient.prototype.fetch = function(Bucket, Key, Callback) {
return \$.ajax({
url:      this.baseurl+Bucket+'/'+Key,
dataType: "json",
success:  Callback
});
}

JiakClient.prototype.remove = function(Bucket, Key, Callback) {
return \$.ajax({
type:    'DELETE',
url:     this.baseurl+Bucket+'/'+Key,
success: Callback
});
}

JiakClient.prototype.walk = function(Start, Spec, Callback) {
var req = {
dataType: "json",
success: Callback
};

if ('bucket' in Start)
req.url = this.baseurl+Start.bucket+'/'+Start.key+'/';
else
req.url = this.baseurl+Start[0]+'/'+Start[1]+'/';

for (i in Spec) {
req.url += (Spec[i].bucket||'_')+','+
(Spec[i].tag||'_')+','+
((Spec[i].acc || i == Spec.length-1) ? '1' : '_')+'/';
}

return \$.ajax(req);
}
```

Now I can create the same effect as the curl commands with this code:

```var J = new JiakClient("/jiak/");
J.store({
bucket:"groups",
key:"todos"
object:{"name":"todo"}
},
function(group) {
J.store({
bucket:"notes",
key:"blog",
object:{
},
});
});
```

With the request for "all notes in the todo group" looking like:

```J.walk({bucket:"groups", key:"todos"},
[{bucket:"notes"}],
function(data) {
var notes = data.results[0];
//do stuff...
});
```

Okay, now I have my backend and frontend – I’d better fill in the middle. Any intermediary webserver could work, but because Riak comes with Webmachine included, I’ll just setup a quick Webmachine app.

```\$ cd \$RIAK_HOME
\$ deps/webmachine/scripts/new_webmachine.erl stickynotes ..
\$ cp config/riak.erlenv ../stickynotes/riak-config.erlenv
```

I copied the Riak config (riak.erlenv) because there are two customizations I want to add to it:

```{add_paths, ["../stickynotes/ebin"]}.
{jiak_buckets, [notes, groups]}.
```

The former puts the stickynotes ebin in Riak’s code path, so Jiak can reach the notes and groups modules I just wrote. The latter does nothing but force the atoms 'notes' and 'groups' into the Riak node so Jiak can use list_to_existing_atom/1.

To my new Webmachine app, I’ll add four things:

1. the notes and groups modules I just wrote
2. the static files from Beebole’s example applcation (HTML, CSS, JS) … modified a bit to load and use the jiak.js utility from above
3. a simple static resource server
4. a simple proxy resource to pass requests through to jiak (the couchdb_proxy from my wmexamples repo will do fine)

Once I have everything aligned (including dispatch setup correctly), I just start Riak and my Webmachine node:

```\$ cd \$RIAK_HOME
\$ ./start-fresh.sh ../stickynotes/riak-config.erlenv
\$ cd ../stickynotes
\$ ./start.sh
```

…then point my browser at http://localhost:8000/, and I see the app’s UI with an empty group ready to store some notes. I recommend opening Firebug to watch the requests fly by.

I realize that I’ve glossed over a bit of the Webmachine application stuff, but that’s because it’s mostly rehash of older posts. The better way to cover all of that material is for me to tell you to open up the demo/stickynotes directory in the Riak repo you just cloned, and read the code written there yourself.