Reading Code: Use Your Verbs

Published Wednesday, February 2, 2011 by Bryan

I've been reflecting on code quality lately. Partly that's because I've been reading far more code than I've been writing. Partly it's because the most recent code I was writing was intended primarily for reading, and only incidentally for execution, in the most literal way: it was instructional, not application-supporting.

And so it is that I've recently reaffirmed my conviction that code's quality is primarily a function of its readability. Readability is of primary importance because code must be able to be understood in order to be used, and the way to understand it best is to read it.

However, I think I can be more specific about one component of readability that holds sway over the rest: naming. Partially the quality of each name, but also the ratio of named to unnamed things. But most important of all, the ratio of named to unnamed verbs.

I first realized this several years ago, while hacking in the middle of a complex, distributed, Java-based system. At one point, I had spent days diving through spaghetti, and finally found the core of the system … and it was beautiful. Not just the best Java code I'd ever seen, but possibly the cleanest code, period. Comparing it to the ugly code I had dug through, I found that its cleanliness derived from the fact that each interesting operation (or “verb”) was segregated into its own named function. Some of those functions called others of those functions, but it was always just one operation described in each.

Later, coincidentally on the same project, I had reason to spend several weeks not in Java, a language I knew very well, but in Perl, Python, and Bash, languages with which I was less familiar. I wrote and modified code very carefully in those languages, making sure that I could test each step as I went along. As that bit of hacking finished, I returned to Java, and found that my style had changed. I was now writing Java in a very careful, easily-testable manner. When I stepped back, I realized that the easily-understood form of my new Java code shared something with that beautiful core I had found earlier: each function described exactly one operation.

I'll demonstrate what I mean with a concrete example. The code below is very similar to code I was hacking recently. The labels have been changed to protect the innocent, even though I think the innocent is me.

The set_properties function expects a token and a collection of properties (keys with matching values) to store for the token. New properties should overwrite old properties of matching keys, but old values for keys that are not specified should remain unchanged. For example, if the token “foo” had the properties [{a,1},{b,1}], and I called set_properties with the new properties [{a,2},{c,2}], then after set_properties finishes, the token “foo” should have the properties [{a,2},{b,1},{c,2}] (the new values for a and c plus the old value for b).

set_properties(Token, NewProperties) ->
    OldProperties = get_properties(Token),
    NewKeys = [ K || {K, _} <- NewProperties ],
    FilteredProperties = [ P || P={K, _} <- OldProperties,
                                not lists:member(K, NewKeys) ],
    set_properties_internal(Token, FilteredProperties ++ NewProperties).
Fig. 1: The Beginning

The code in Figure 1 is where I started. This code is correct: it conforms to the spec given, passes all tests (indeed, has been in production, working, for over a year). But, it is also bad code. The hint why is the NewKeys variable. It has little to do with setting new properties; it's merely an artifact of cleaning up old properties. It's an indication that the two list comprehensions that reference it are really an unnamed verb separate from set_properties.

set_properties(Token, NewProperties) ->
   OldProperties = get_properties(Token),
   MergedProperties = merge_properties(NewProperties, OldProperties).
   set_properties_internal(Token, MergedProperties).

merge_properties(Keep, Toss) ->
   KeepKeys = [ K || {K, _} <- Keep ],
   FilteredToss = [ P || P={K, _} <- Toss,
                         not lists:member(K, KeepKeys) ],
   FilteredToss ++ Keep.
Fig. 2: Naming the Verb

I propose that the code in Figure 2 is an improvement upon the code in Figure 1. The set_properties function now says just exactly what it's going to do: lookup the old properties, merge them with the new properties, and store the result. The details about how the merge is performed, the unnamed verb in Figure 1, have been relocated to a new function, named merge_properties. The intermediate list of keys is still produced, but it's now obvious that it's just part of the merging process.

set_properties(Token, NewProperties) ->
   OldProperties = get_properties(Token),
   MergedProperties = merge_properties(NewProperties, OldProperties),
   set_properties_internal(Token, MergedProperties).

merge_properties(Keep, Toss) ->
   lists:ukeymerge(1, lists:ukeysort(1, Keep), lists:ukeysort(1, Toss)).
Fig. 3: Using an Existing Name

Figure 3 is a demonstration of part of the reason that MIT changed the 6.001 curriculum. There was no need to write those list comprehensions. Someone had already written the equivalent and named it. It is far clearer to use that named operation than to reimplement. The confusion about why NewKeys was created has been removed, and so has the need to decrypt the other list comprehension.

set_properties(Token, NewProperties) ->
   OldProperties = get_properties(Token),
   MergedProperties = lists:ukeymerge(1,
                         lists:ukeysort(1, NewProperties),
                         lists:ukeysort(1, OldProperties)),
   set_properties_internal(Token, MergedProperties).
Fig. 4: Breaking Context

It's a valid question to ask why I didn't recommend jumping straight from Figure 1 to Figure 4, instead of ending up at Figure 3. It's true that Figure 4 is a large improvement on Figure 1, but the answer is that even though lists:ukeymerge/3 is a named verb, it's a verb with less context than merge_properties in my module. The context is richer than this snippet suggests, because there is at least one other function in this module that needs to perform the same operation. Also, to reference 6.001 again, “Abstraction barrier!” Why does set_properties need to know the data structure I'm using?

set_properties(Token, NewProperties) ->
   set_properties_internal(
      Token, merge_properties(NewProperties, get_properties(Token))).

merge_properties(Keep, Toss) ->
   lists:ukeymerge(1, lists:ukeysort(1, Keep), lists:ukeysort(1, Toss)).
Fig. 5: Anonymous Nouns

Another valid question is why I didn't continue on to Figure 5 after Figure 3. In truth, I did consider it. My eye sees less clutter, but having discussed this exact choice with many coworkers, I've learned that others don't. It also goes against the grain of what this post has been advocating: while I've worked to name my verbs, Figure 5 anonymized my nouns. There's a practical reason to keep names for nouns around: printf debugging. Unless I have a very nice macro handy that I can wrap around one of the function calls in-place in Figure 5, I'm forced to copy one of those calls to some other place, and possibly even give it a name, before I can wrap my print statement around it. In Figure 3, the names are already there; all I have to do is use them.

What else could be improved in Figure 3? Plenty: “merge” is a bit generic and over-used; “properties” is long, noisy, and redundant in-context. Is my omission of names for the sorted lists in merge_properties/2 hypocritical? Probably. Readability is a subjective, human measure. In multiple projects and languages, I've identified verb-naming as important in my judgement of a code's readability. Maybe writing that fact down will help me remember to think about it in new code I write.

Categories: Miscellaneous