Denormalization, Processes

Published Tuesday, January 1, 2008 by Bryan

If you read the news, you'll know that tuneups are happening behind the scenes of BeerRiot. If you came to this blog after reading that story, you're wondering what, exactly, they are.

If I'm not feeling particularly communication-challenged, I'll be able to explain them to you. ;)

The first tuneup is one every webmaster has heard of: denormalization. I had been using a view to select data from three tables with one call. The performance drag of that query was serious enough, though, that I've decided to complicate things a bit and copy the extra bits of data I need from the other tables into the main one for the query.

The speed gain is great, and, somewhat strangely, the denormalization actually cleaned up a bunch of my code. ErlyDB lacks a "one-to-one" relation, so it was impossible for me to say "each record in this view is really just a record in this other table with some extra data." That made for a bit of hackery swinging from one type to another. Without that extra table, I think the code reads more clearly.

(Disclaimer: I'm far from being an relational database master, so it's likely that there is a much better way to express everything I'm doing. But, I'm happy to be making what seems to be forward progress.)

The other main change is more Erlang-centric. Until now, I had been tracking sessions using a customization of the Yaws recommended session server. This is basically a central process that stores opaque data associated with an id string. Whenever your app gets a request, it pulls the cookie value out and checks with this central process to find out if there is any opaque data associated with this key. It works (quite well, in fact), but it seems like a bit of a bottle neck.

So, I've decided that there's a more Erlangy way to do things. What BeerRiot is doing now is starting up a new process for each session, and saving that process id in a client cookie. Then, whenever a request comes in, if it has a cookie with a PID, we can try to contact that session's handling process directly. No central service required.

It turns out that there's loads of benefits to having this session hanging around beyond relieving the central service bottleneck. It can cache data, smartly (i.e. listen for updates, etc.). It's a natural place to run background processes (like propagating live changes to durable storage). I see other potential uses, but since I haven't tested them yet, I'll hold my tongue to avoid getting too many hopes up. ;)

For Facebook developers: This process-session system wasn't possible until just a few weeks ago, when Facebook started supporting cookies on the canvas page. Unfortunately, they only support them for canvas requests, and not for their "mock ajax." For mock ajax, I've decided to just encode the cookie values in post parameters. It works (and it's no more inconsistent than the rest of the Facebook Developer experience).

Update 2.Jan 18:52 EDT: If you spent any part of today poking at BeerRiot to see how the speed-ups turned out, you were probably rather dissatisfied. I just figured out that I didn't fully rollout the update. :P It's there now, and I think you'll be much more impressed.