FSMs Make Instrumentation Easy

This piece originally appeared on the Honeycomb.io blog as part of a series on instrumentation.

There is a way to structure programs that makes inclusion of instrumentation straightforward and automatic, and it’s one that every hardware and software engineer should be completely familiar with: finite state machines. You have seen them time and again as illustration of how a system works:

Turnstile state machine

What makes FSM instrumentation straightforward is that the place to expose information is obvious: along the edges, when the state of the system is changing. What makes it automatic is that some generic actor is usually driving a host of specific FSMs. You only need to instrument the actor (“entering state Q with message P”, “leaving state S with result R”), and every FSM it runs will be instrumented for free.

I learned how easy FSMs are to instrument while working on Webmachine, the webserver that is known for implementing the “HTTP Flowchart”.

HTTP Diagram

Each Webmachine resource (a module handling a request) is composed of a set of decision functions. The functions are named for the points in the flowchart where decisions have to be made about which branch to follow. This is just alternate terminology, though: the flowchart and resource describe an FSM, in which the decision points (and terminals) are states.

Driving the execution of a Webmachine resource is a module called webmachine_decision_core. This is where the logic lives for which function to call, and which branch to take based on the result. It triggers each function evaluation by calling a generic webmachine_resource:resource_call function, with the name of the decision.

resource_call(F, ReqData,
                }) ->
    case R_Trace of
        false -> nop;
        _ -> log_call(R_Trace, attempt, R_Mod, F, [ReqData, R_ModState])
    Result = try
        apply(R_Mod, F, [ReqData, R_ModState])
    catch C:R ->
            Reason = {C, R, trim_trace(erlang:get_stacktrace())},
            {{error, Reason}, ReqData, R_ModState}
    case R_Trace of
        false -> nop;
        _ -> log_call(R_Trace, result, R_Mod, F, Result)

This is where the ease of instrumenting an FSM is obvious. The entirety of the hooks needed to support tracing and visual debugging of every Webmachine resource are those two log_call lines. They record the entrance and exit of each state of the FSM without requiring any code to complicate the implementation of the resource module itself. For example, a simple resource:



init([]) ->
    {{trace, "/tmp"}, undefined}.

content_types_provided(ReqData, State) ->
    {[{"text/html", to_html}], ReqData, State}.

to_html(ReqData, State) ->
    {"<html><body>Hello, new world</body></html>", ReqData, State}.

This resource does no logging of its own (as you can see), but for each request it receives, a file is created in /tmp that can be rendered with the Webmachine visual debugger. For example, the processing for a request that specifies Accept: text/html looks like this (live example):


It’s easy to see that the request made it all the way to the 200 OK result at grid location N18. Along the way, it passed through many decisions where the default behavior was chosen (grey-outlined diamonds), and a few where the resource’s own implementation was called (purple-outlined diamonds). Clicking on any decision will display more information about what happened there.

In contrast, the processing for a request that specifies Accept: application/json looks like this (live example):


Now it’s easy to see that the request stopped at the 406 Not Acceptable result at grid location C7 instead. For no more code than specifying where to put the log output, we’ve gotten the complete story of how each request was handled. In case you prefer the original text to this visual styling, I’ve also archived the raw trace files.

This sort of regular, simple instrumentation may seem naive, but the regularity and simplicity offer some benefits. For example, all of the instrumentation points have obvious names: they are the same as the states of the FSM. This alone continues to help beginners bootstrap their understanding of Webmachine. When they’re confused about why something happened, they can go straight to the trace or debugger, and either search for the name of the decision they expected to turn differently, or find the name of the decision that did go differently, and know exactly where to return to in their code. Resource implementors add no code, but get well-labeled tracing for free.

Finite state machines can be found under many other names: flowcharts, chains, pipelines, decision trees, and more. Any staged-processing workflow benefits from a basic “stage X began work W”, “stage X finished work W”, which is completely independent of what the stage is doing, and is equivalent to the stage entering and exiting the “working” state. See Hadoop’s job statistics for an example: generically generated start/stop information that an operator can use to get a basic idea of progress without needing the job implementor to add their own instrumentation. I sometimes even consider the basic request/response logging of multi-service systems as a form of this: sending a request is equivalent to entering a waiting state, etc.

To speak more broadly, the important points to instrument are those when application state is changing. This is how I track down where a process diverged from its expected path, or how long it took to make the change. Finite state machines help by making those points more obvious. Instrumenting state transitions reduces the burden on the implementor, by naturally answering the question of where instrumentation belongs and what it’s called. It also reduces the burden on the user of learning what the implementor decided. Inspection of the system becomes easier because the state transitions are always instrumented, and instrumented in a way that maps directly to the system’s operation.

Thanks to Julia and Charity for organizing the instrumentation series.

Roundtripping the HTTP Flowchart

It has long bugged many of the Webmachine hackers that this relationship with Alan Dean’s HTTP flowchart is one-way. Webmachine was made from that graph, but that graph wasn’t made from Webmachine. I decided to change that in my evenings last week.

Webmachine hackers are familiar with a certain flowchart representing the decisions made during the processing of an HTTP request. Webmachine was designed as a practical executable form of that flowchart.

It has long bugged many of the Webmachine hackers that this relationship is one-way, though. Webmachine was made from the graph, but the graph wasn’t made from Webmachine. I decided to change that in my evenings last week, while trying to take my mind off of Riak 1.0 testing.

This is a version of the HTTP flowchart that only a Webmachine hacker could love. It’s ugly and missing some information, but the important part is that it’s generated by parsing webmachine_decision_core.erl.

I’ve shared the code for generating this image in the gen-graph branch of my webmachine fork. Make sure you have Graphviz installed, then checkout that branch and run make graph && open docs/wdc_graph.png.

In addition to the PNG, you’ll also find a docs/wdc_graph.dot if you prefer to render to some other format.

If you’d really like to dig in, I suggest firing up an Erlang node and looking at the output of wdc_graph:parse("src/webmachine_decision_core.erl"):

[{v3b13, [ping],                     [v3b13b,503]},
 {v3b13b,[service_available],        [v3b12,503]},
 {v3b12, [known_methods],            [v3b11,501]},
 {v3b11, [uri_too_long],             [414,v3b10]},
 {v3b10, [allowed_methods,'RESPOND'],[v3b9,405]},
 {v3b9,  [malformed_request],        [400,v3b8]},

If you’ve looked through webmachine_decision_core at all, I think you’ll recognize what’s presented above: a list of tuples, each one representing the decision named by the first element, with the calls made to a resource module as the second element, and the possible outcomes as the third element. Call wdc_graph:dot/2 to convert those tuples to a DOT file.

There are a few holes in the generation. Some response codes are reached by decisions spread across the graph, causing long arrows to cross confusingly. The edges between decisions aren’t labeled with the criteria for following them. Some resource calls are left out (like those made from webmachine_decision_core:respond/1 and the response body producers and encoders). It’s good to have a nice list for future tinkering.

Riak Presented at NYC NoSQL – slides, text & video

I presented Riak at the NYC NoSQL Mini-Conference on October 5, 2009. Slides, text, and video of my talk are now available.

I had the pleasure of attending the NYC NoSQL Fall ’09 Meetup/Mini-Conference last Monday. Great talks, all around. I thought it was a good mix of use-case analysis and technology introduction.

In addition to enjoying everyone else’s presentations, I also presented Riak. It was a quick 12-minute talk, followed by 2.5 minutes of questions, but the response I got was great. People really dug in and had interesting observations and questions to discuss afterward.

If you weren’t able to make the event, Brendan has posted video of my talk. I have also posted an HTML slides-and-text version of my talk, if you prefer reading over watching and listening.

Riak Demo: Stickynotes

Basho released Riak to the world last Friday. Riak is as simple as downloading it and hitting the HTTP interface. You can stop reading now and happily speak REST to it for the rest of your application’s lifetime. If you’re interested in more advanced features like field validation and link-walking, the rest of this post demonstrate those features by discussing the development of the demo application that comes with Riak.

Warning: This blog post is woefully out of date:

  1. Instead of building Riak from source, you should download a pre-compiled release from downloads.basho.com (though the source is still available, if you want).
  2. Starting Riak is now done with the bin/riak start command, not start-fresh.sh Stopping is also easily done with bin/riak stop
  3. Jiak is no longer the preferred HTTP interface. There is now one that accepts any content type, not just JSON.
  4. Without Jiak, back-linking of notes will need to be done with post-commit hooks.

There are probably other things I’ve overlooked. Really, you should just head over to the Riak wiki to get the intro.

Basho released Riak to the world just over a week ago. Some think that the docs are good enough that further explanation is not necessary. But, if you’re looking for a little more introduction, read on.

The short version of this post: Riak is as simple as downloading it and hitting the HTTP interface:

1$ hg clone http://bitbucket.org/basho/riak/
2$ cd riak
3$ make
4$ ./start-fresh.sh config/riak.erlenv

5$ curl -X PUT -H "Content-type: application/json" \
5>   http://localhost:8098/jiak/foo \
5>   -d "{\"schema\":{\"allowed_fields\":[\"bar\"],\"required_fields\":[\"bar\"],\"read_mask\":[\"bar\"],\"write_mask\":[\"bar\"]}}"

6$ curl -X PUT -H "Content-type: application/json" \
6>   http://localhost:8098/jiak/foo/baz \
6>   -d "{\"bucket\":\"foo\",\"key\":\"baz\",\"object\":{\"bar\":\"Hello World\"},\"links\":[]}"

7$ curl http://localhost:8098/jiak/foo

8$ curl http://localhost:8098/jiak/foo/baz
{"object":{"bar":"Hello World"},"vclock":"MzIwsDSwMDQyMjAyNjXQLcpMzHYwNDLXMwBCQ3SusYGZoZGxvqG+mbGJobmxkbmFsQEA","lastmod":"Wed, 12 Aug 2009 20:23:50 GMT","vtag":"6DxaqiRCDBevf03tzPZpzl","bucket":"foo","key":"baz","links":[]}

Commands 1-4 download and start Riak. Command 5 creates the bucket foo, and tells Jiak, Riak’s HTTP interface, that objects in bucket foo must have a bar field that is readable and writable. Command 6 creates the object baz in bucket foo. Command 7 gets the listing of bucket foo. Command 8 retrieves object baz in bucket foo.

Aside: Some have found it difficult to stop Riak, so I’ll throw the tip out here: killall heart. You may also have to kill the erl command afterward, but if you don’t kill heart first, Riak will just come right back.

That’s all it takes to use Riak. You can stop reading now and happily speak REST to it for the rest of your application’s lifetime.

If you’re still hanging around, though, maybe you’d be interested in a few more features that Jiak has to offer, like link-walking and field validation. To demonstrate those, I’ll spend the rest of this post describing the creation of an application on Riak.

Let’s say that I wanted to create a note-taking system, something like the fine Erlang/Mochiweb demo put together by the guys at Beebole.

There will be two kinds of objects in my system: notes and groups. The properties of a note will be the text of the note, the color, the position on the screen, and the stacking order. The properties of a group will be the name of the group and the list of notes in the group.

I’ll start by whipping up two modules to manage those buckets for me. These modules tell jiak_resource how to validate the structure of objects in these buckets (jiak_resource is the Webmachine resource that runs Riak’s HTTP interface). They’ll follow the basic structure of jiak_example.erl, which ships with Riak.

-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
         required_fields/0, read_mask/0, write_mask/0,
         expires_in_seconds/3, check_write/4, effect_write/4,
         after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields()  -> [<<"name">>].
required_fields() -> allowed_fields().
read_mask()       -> allowed_fields().
write_mask()      -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
    {600, ReqData, Context}.

check_write({_PutType, _Key}, JiakObject, ReqData, Context) ->
    {ObjDiffs,_} = Context:diff(),
    case lists:foldl(fun check_diff/2, [], ObjDiffs) of
        [] ->
            {{ok, JiakObject}, ReqData, Context};
        Errors ->
            {{error, list_to_binary(string:join(Errors, ", "))},
             ReqData, Context}

check_diff({<<"name">>, _, Value}, ErrorAcc) ->
    if is_binary(Value) -> ErrorAcc;
       true             -> ["name field must be a string"|ErrorAcc]

effect_write(_Key, JiakObject, ReqData, Context) ->
    {{ok, JiakObject}, ReqData, Context}.

after_write(_Key, _JiakObject, ReqData, Context) ->
    {ok, ReqData, Context}.

merge_siblings(Siblings) ->

The groups module ensures that every group has a name field (through implementation of allowed_fields/0, required_fields/0, read_fields/0, and write_fields/0), and that the value of that field is a string (using check_write/4 and check_diff/2).

-export([init/2, auth_ok/3, bucket_listable/0, allowed_fields/0,
         required_fields/0, read_mask/0, write_mask/0,
         expires_in_seconds/3, check_write/4, effect_write/4,
         after_write/4, merge_siblings/1]).

init(_Key, Context) -> {ok, Context}.

auth_ok(_Key, ReqData, Context) -> {true, ReqData, Context}.

bucket_listable() -> true.

allowed_fields() ->
    [<<"text">>, <<"x">>, <<"y">>, <<"z">>, <<"color">>].

required_fields() -> allowed_fields().
read_mask()       -> allowed_fields().
write_mask()      -> allowed_fields().

expires_in_seconds(_Key, ReqData, Context) ->
    {600, ReqData, Context}.

check_write({_PutType, Key}, JiakObject, ReqData, Context) ->
    {ObjDiffs,_} = Context:diff(),
    case lists:foldl(fun check_diff/2, [], ObjDiffs) of
        [] ->
            {{ok, JiakObject}, ReqData, Context:set_prop(key, Key)};
        Errors ->
            {{error, list_to_binary(string:join(Errors, ", "))},
             ReqData, Context}

-define(COLORS, [<<"yellow">>, <<"pink">>, <<"green">>, <<"blue">>]).

check_diff({<<"text">>, _, Value}, ErrorAcc) ->
    if is_binary(Value) -> ErrorAcc;
       true             -> ["text field must be a string"|ErrorAcc]
check_diff({Coord, _, Value}, ErrorAcc)
  when Coord==<<"x">>;Coord==<<"y">>;Coord==<<"z">> ->
    if is_integer(Value) -> ErrorAcc;
       true ->
            [io_lib:format("~s field must be an integer", [Coord])
check_diff({<<"color">>, _, Value}, ErrorAcc) ->
    case lists:member(Value, ?COLORS) of
        true -> ErrorAcc;
        false ->
            [io_lib:format("color field must be one of (~s)",

effect_write(_Key, JiakObject, ReqData, Context) ->
    {{ok, JiakObject}, ReqData, Context}.

after_write(_Key, JiakObject, ReqData, Context) ->
    spawn(fun() ->
                  [[_, GroupKey, _]] = jiak_object:links(JiakObject, groups),
                  {ok, C} = jiak:local_client(),
                  {ok, G} = C:get(groups, GroupKey, 2),
                  Key = Context:get_prop(key),
                  C:put(jiak_object:add_link(G, notes, Key, <<"note">>), 2)
    {ok, ReqData, Context}.

merge_siblings(Siblings) ->

The notes module ensures that all notes have a text field that is a string; x, y, and z fields that are integers; and a color field that is one of a specific list of strings (using the same functions as the groups module used).

One more interesting thing happens in the notes module. If you look at after_write/4, you’ll see that it fetches the groups object that the note links to, and adds the note to that group’s links. Jiak calls after_write/4 after the note has been stored in Riak, so what I’ve written here is effectively an automatic back-link monitor. We’ll return to the concept of links in a moment.

If I put the notes and groups modules in place, I can fire up Riak and immediately being sending HTTP requests to its Jiak interface to create and modify groups and notes. For example:

$ curl -X PUT -H "Content-type: application/json" \
> \
>   -d "{\"bucket\":\"groups\",\"key\":\"todos\",\"object\":{\"name\":\"todo\"}\"links\":[]}"

$ curl -X PUT -H "Content-type: application/json" \
> \
>   -d "{\"bucket\":\"notes\",\"key\":\"blog\",\"object\":{\"text\":\"finish blog post\",\"x\":0,\"y\":0,\"z\":0,\"color\":\"green\"},\"links\":[[\"groups\",\"todos\",\"open\"]]"

These two lines would create a group named todo with a note labeled finish blog post.

Now, about those links. See the ["groups","todos","open"] item in the links field of that notes object? That’s a link to the groups object named todos, and I’ve tagged it open. The affect_write/4 function in the notes module will add a link to the groups object of the form ["notes","blog","note"].

What does this get me? More than just record-keeping: I can now use the Jiak utility jaywalker to get all of the notes in the todo group with a single query:

$ curl,_,_

You’ll recognize the first part, through /jiak/groups/todos/. The segment after that is a link query. This ones says "all notes objects, with any tag." The links are structured as {bucket},{tag},{accumulate} segments, with underscore meaning "any."

The example query will return an object with a results field that is a list of lists of results. That is, if I were to store the data returned from that query in a variable called data, my list of notes would be at data.results[0].

I’m not limited to one hop, either. If there were also person objects in my system, linked from notes objects as authors I might get all of the authors of the notes in a group with:

$ curl,_,_/person,author,_

More information on link-walking can be found in the jaywalker_resource documentation.

The curl commands and the HTTP requests they represent are pretty simple, but I’m going to be hitting these resources from Javascript running in a browser. So, I’ll wrap it all up in a nice utility class:

function JiakClient(BaseUrl, Opts) {
    this.baseurl = BaseUrl;
    if (!(this.baseurl.slice(-1) == '/'))
        this.baseurl += '/';

    this.opts = Opts||{};

JiakClient.prototype.store = function(Object, Callback, NoReturnBody) {
    var req = {
        contentType: "application/json",
        dataType: "json"

    if (this.opts.alwaysPost || !Object.key)
        req.type = 'POST';
        req.type = 'PUT';
    req.url = this.baseurl+Object.bucket+'/';
    if (Object.key) req.url += Object.key;
    if (!(this.opts.noReturnBody || NoReturnBody))
        req.url += '?returnbody=true';

    if (typeof Callback == 'function')
        req.success = Callback;

    req.data = JSON.stringify(Object);

    return $.ajax(req);

JiakClient.prototype.fetch = function(Bucket, Key, Callback) {
    return $.ajax({
        url:      this.baseurl+Bucket+'/'+Key,
        dataType: "json",
        success:  Callback

JiakClient.prototype.remove = function(Bucket, Key, Callback) {
    return $.ajax({
        type:    'DELETE',
        url:     this.baseurl+Bucket+'/'+Key,
        success: Callback

JiakClient.prototype.walk = function(Start, Spec, Callback) {
    var req = {
        dataType: "json",
        success: Callback

    if ('bucket' in Start)
        req.url = this.baseurl+Start.bucket+'/'+Start.key+'/';
        req.url = this.baseurl+Start[0]+'/'+Start[1]+'/';

    for (i in Spec) {
        req.url += (Spec[i].bucket||'_')+','+
            ((Spec[i].acc || i == Spec.length-1) ? '1' : '_')+'/';

    return $.ajax(req);

Now I can create the same effect as the curl commands with this code:

var J = new JiakClient("/jiak/");
function(group) {

With the request for "all notes in the todo group" looking like:

J.walk({bucket:"groups", key:"todos"},
       function(data) {
         var notes = data.results[0];
         //do stuff...

Okay, now I have my backend and frontend – I’d better fill in the middle. Any intermediary webserver could work, but because Riak comes with Webmachine included, I’ll just setup a quick Webmachine app.

$ deps/webmachine/scripts/new_webmachine.erl stickynotes ..
$ cp config/riak.erlenv ../stickynotes/riak-config.erlenv

I copied the Riak config (riak.erlenv) because there are two customizations I want to add to it:

{add_paths, ["../stickynotes/ebin"]}.
{jiak_buckets, [notes, groups]}.

The former puts the stickynotes ebin in Riak’s code path, so Jiak can reach the notes and groups modules I just wrote. The latter does nothing but force the atoms 'notes' and 'groups' into the Riak node so Jiak can use list_to_existing_atom/1.

To my new Webmachine app, I’ll add four things:

  1. the notes and groups modules I just wrote
  2. the static files from Beebole’s example applcation (HTML, CSS, JS) … modified a bit to load and use the jiak.js utility from above
  3. a simple static resource server
  4. a simple proxy resource to pass requests through to jiak (the couchdb_proxy from my wmexamples repo will do fine)

Once I have everything aligned (including dispatch setup correctly), I just start Riak and my Webmachine node:

$ ./start-fresh.sh ../stickynotes/riak-config.erlenv
$ cd ../stickynotes
$ ./start.sh

…then point my browser at http://localhost:8000/, and I see the app’s UI with an empty group ready to store some notes. I recommend opening Firebug to watch the requests fly by.

I realize that I’ve glossed over a bit of the Webmachine application stuff, but that’s because it’s mostly rehash of older posts. The better way to cover all of that material is for me to tell you to open up the demo/stickynotes directory in the Riak repo you just cloned, and read the code written there yourself. 🙂

Webmachine Bloggy Goodness

Webmachine (and Erlang use in general) is picking up. If you’re looking for a few more people to follow in relation to webmachine, I have some suggestions.

A whole bunch of people (no half-bunches in these parts) are picking up Webmachine and writing about their experiences. If you’re looking for a few more people to follow:

Andy Gross (a coworker) is writing about file uploads. Actual webmachine details to follow in his next post – this first one is about the client side of the equation. Update: that second post is up, and includes an example of the new “streamed body” feature in webmachine 1.3.

Seth Falcon is covering the design and implementation of a URL-shortening service. Again, webmachine details to follow, but there’s some good talk about using Mnesia for storage in the current post.

Marc Worrell is laying down the outline of his new CMS, Zophrenic. This looks like a very ambitious project, and I’m anxious to see its release.

Update: More blogs from the comments

Daniel Kwiecinski just covered redirecting (3xx responses), and also recently showed off his own static-file-serving resource. He also tips his hat to wmtrace, so he gets brownie points. 😉

Great reads, but don’t take my word for it.</burton>

Webmachine POST Example

Many people have asked for an example Webmachine resource that responds to POST. If you follow my twitter feed, you may have caught this gem.

I figured that example could use a little fleshing out, so I’ve added a resource to my wmexamples repo.

formjson_resourc.erl makes an attempt at demonstrating the simplest way to handle a POST, while also demonstrating the difference between content-producing functions (to_json/2 in this example, and others named in content_types_provided/2), which put content in the response body simply by returning it, and other functions, which have to put content in the response body by returning a modified ReqData.

For another example of handling POST, read demo_fs_resource.erl that comes with Webmachine. It implements post_is_create/2, create_path/2, content_types_accepted/2, and accept_content/2 to handle POST requests. (Incidentally, demo_fs_resource is a good example of many Webmachine resource functions.)

Updated to include content_types_accepted/2 in the list of functions handling POST requests – thanks for catching it, Lou!

Webmachine Slideshow Intro

Justin Sheehy has just posted his video slideshow introduction to webmachine. The video contains the slides and most of the talk that Justin gave at the Bay Area Erlang Factory last month. If you’ve been thinking about checking out Webmachine, this half hour is well worth your time.

Webmachine has also gained several new features recently, including delayed body receive and a more independent request dispatcher. Partial receive and send is even available in tip.

Unfortunately, the trace utility required changes that broke compatiblity with old trace files. Support for those old traces could probably be hacked in without too much trouble if you need it (the cause was mainly just a change to a record structure). I assumed that trace files are somewhat ephemeral, though – really only needed for debugging, and then thrown away.

Finally, if you’re looking for example resources and dispatch tables, I recommend checking out my wmexamples repo on bitbucket. It has a smattering of working resources that can hopefully give you a feel for how we normally write them.