Doing it Live

We have a phrase around the office: “Do it live!” It comes from the incredible freakout of Bill O’Reilly. We use it to mean something along the lines of, “This is a startup. The plan might change at any time. Changes go to production when we need them to, and we roll with bugs as best we can.” Far from encouraging careless, fickle choices, it’s a reminder that the camera is on, we’re live, and we are actively developing a product that is under close scrutiny.

Luckily, we have the power of Erlang behind us. The dynamic nature of the language and runtime is a fantastic fit for an environment in which things may change at a moment’s notice.

Erlang’s dynamic nature also came in useful for me on BeerRiot last night. I’ve blogged about hot code loading before, but last night I dipped into the world of OTP applications and Mnesia.

I realized late yesterday afternoon that I had left the login code in a state where usernames were case-sensitive. People could have signed up as “Bryan” and “BRYAN”, even though I already owned the login “bryan”. Basically, I was lazy; the username lookup code was roughly:

%% Name is the test username as read out of the http request
mnesia:transaction(
    fun() ->
        mnesia:match_object(#person{name=Name, _='_'})
    end).

What I needed to do was downcase both the test name and the stored name, and compare those results. I could have just tossed in a call to string:to_lower and reloaded the login module, except that I'm trying to support UTF-8 everywhere. To downcase a UTF-8 string, I needed another library (because I'm not going to both implementing my own).

Google pointed me in the direction of Starling. Despite the strange build process[1], starling provides an Erlang interface to the ICU libraries, to enable unicode manipulations. A quick build and test, and we have

LowerName = ustring:downcase(ustring:new(Name))

Toss an application:start(starling) in the BeerRiot startup code, and everything's set to go ... but why would I want to restart the webserver? Restarting is lame - we're doing it live!

Instead of restarting, we'll connect to the webserver through an erl shell (see my earlier hot code loading post about doing this) and modify the running system. We just need two simple commands to get this done.

1> code:add_paths(["/path/to/starling/ebin"]).
ok
2> application:start(starling).

Command 1 tells Erlang to add a path to its library loading search. Command 2 starts the starling application. Starling is now up and running, and we can ustring:downcase/1 as much as we want.

But, I really don't want to downcase every stored username every time. It's also kind of nice for people's usernames to display as they typed them, but not require the same capitalization in their login form. So, I'll need to store the downcased version somewhere, in addition to keeping the original. I could put it in a new table, mapping back to the persons table, but it's person data - let's keep it with the person.

I need to add a field to my person record. But if I do that, all of my code looking for a person record of the current format will break. I need to update all of my person records in storage as soon as I load the code with the modified person record definition.

Mnesia gives us just the tool for this: mnesia:transform_table/3. All we have to do is provide a function that knows how to translate an old person record into a new one. Something like this will do:

%% old def: -record(person, {id, name}).
%% new def: -record(person, {id, name, login}).
add_login() ->
    mnesia:transform_table(
        person,
        fun({person, Id, Name}) ->
            {person, Id, Name, ustring:downcase(ustring:new(Name))}
        end,
        record_info(fields, person).

Stick that code in the person module, where the person record is defined. Now, connect back to the webserver and simply:

3> l(person).
{module, person}
4> person:add_login().

There's a short period of time in there, between the ends of commands 3 and 4 where any code that looks up a person record will break. But, it's short, and the entire rest of the site will continue functioning flawlessly.

And that's the amazing power of Erlang. A very brief, very limited hiccup, and new functionality is deployed. Assuming the appropriate code was put in place to start everything up on restart, the system will come up in exactly the state you want it if the server should ever reboot.

Now back to tinkering... :)

[1]I oughta 'make' you 'rake' my lawn, which you're on, by the way, sonny.

About these ads

5 comments so far

  1. Harish Mallipeddi on

    This is assuming you don’t have any #person records stored anywhere in memory – for instance in the State of some gen_server? If yes then you’ve to write code to migrate via gen_server:code_change, add “appup” file, add “rel” files, etc and very soon it becomes complicated (I guess it’s better than those other languages which don’t support hot code loading out of the box). But still very cool example of hot code loading in production!

  2. Bryan on

    A very good point, Harish! My usual way of dealing with this? Let the process fail. If the computation was necessary, it has been registered with a supervisor, and will be restarted, ready to grab the fresh data.

    Also, I generally try to write long-running processes such that they don’t hang onto data representations from other modules any longer than necessary. For example, a process may keep a list of ids and names from a #person, but won’t hold onto the whole record. So, that process won’t be affected by a change in the person record shape. This doesn’t account for those short periods of time when a process *is* holding onto an external data format, but in those cases, I just revert back to “let it fail.”

    I haven’t done an “appup” or “rel” yet, but you’re right: even if gen_server:code_change does get hairy, it’s still better than not being able to do it at all.

    Thanks for the note.

  3. Nick Gerakines on

    That’s great. Very cool write up. I’m quite the fan of rolling your own reload/0 function that iterates through your app’s modules with a quick code:purge/1 and code:load_file/1.

  4. Bryan on

    Thanks, Nick – glad you liked it. I agree with implementing reload/0. Among other conveniences, it shortens the downtime. No typing delay, no typo penalty, no chance of botching it bad enough that you have to spend time fixing something else!

  5. Geoff on

    You can also use the gen_(server|fsm) code change stuff via the ‘sys’ module. sys:suspend/1 the gen_servers running the module to be upgraded, load the new code, sys:change_code/4 and then sys:resume/1 them all. It’s handy if you don’t want to go to the hassle of a full release upgrade.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: