Intent, not implementation

When designing interfaces / APIs, it is easy to design the interface around the solution space. This makes such interfaces difficult to use, difficult to test, and difficult to maintain. Instead, our interfaces should allow users to easily express their intent.

(I had originally written this article in 2017, but never got around to publish the draft. This post is kind of a counterpoint to my earlier article Extract Your Dependencies.)

A callback too far

I was refactoring a large “god object” class for testability. There were huge gains to be had by extracting all related methods into a class of their own. But many methods turned out to be nearly the same, differing only at a couple of well-defined points.

So in my naivety, I made each of those points into a callback that was passed into the general method – compare the strategy design pattern. This allowed ultimate flexibility for users, but required them to provide these callbacks without real context, and to put low-level implementation details into these callbacks that ought to be hidden. In other words, the callbacks were too closely coupled with the control flow of my function. A user could not possibly understand why a certain callback might be necessary. And more annoyingly, it was nearly impossible to test that the callbacks did the right thing.

# the function I'm working on
def send_message(
    payload_type,
    payload_dto,
    on_before_message_register = None,
    on_before_side_effects = None):

  message = Message(
    RECIPIENT,
    SENDER,
    payload_type,
    payload_dto.as_str())

  if on_before_side_effects:
    on_before_side_effects()

  with cache as lock:
    if on_before_message_register:
      on_before_message_register(cache)
    cache.add(message)

  return send(message)

# client code – obfuscated by low-level concerns
def handle_foo_data(foo):
  dto = FooDTO(foo)
  send_message(
    payload_type="FOO",
    payload_dto=dto,
    on_before_side_effects=lambda: logger.info("sending FOO message"))

def handle_bar_data(bar);
  dto = BarDTO(bar)
  send_message(
    payload_type="BAR",
    payload_dto=dto
    on_before_side_effects=lambda: logger.info("sending BAR message: {}", dto),
    on_before_message_register=lambda cache: cache.clear())

Code becomes simpler when focusing on intent

A week later I revisited that code, and thought about what the callers of my extracted method were trying to do. Basically, they were injecting logging at a specific point, or clearing a cache just before a message was sent.

The logging problem is easy to solve: every function should manage its own logging. Connecting multiple layers through callbacks defeats the purpose of logging: getting a sense of what the hell is happening,

Resetting the cache either happened or it didn't. That doesn't need a callback. It needs an if/else, and a boolean parameter to replace my callback.

# the function I'm working on – much simpler now
def send_message(
    payload_type,
    payload_dto,
    clear_cache = False):

  message = Message(
    RECIPIENT,
    SENDER,
    payload_type,
    payload_dto.as_str())

  logger.info("Sending {} message: {}", payload_type, message)

  with cache as lock:
    if clear_cache:
      cache.clear()
    cache.add(message)

  return send(message)

# the client code – much clearer what's going on
def handle_foo_data(foo):
  dto = FooDTO(foo)
  send_message("FOO", dto)

def handle_bar_data(bar);
  dto = BarDTO(bar)
  send_message("BAR", dto, clear_cache=True)

Complexity tradeoffs

At first I was wary to add a conditional into the function where one code path would only be needed by 2 of 11 invocations. Isn't adding cyclomatic complexity bad?

I had walked into the Replace Conditionals With Polymorphism refactoring trap. This technique is useful when the same condition is tested again and again, but for my single conditional it was nothing but obfuscation.

The Replace Conditionals With Polymorphism refactoring trades complicated local control flow with high cyclomatic complexity for a combination of simple local control flow with non-local control flow. The whole system complexity doesn't change.

Frequently, applying this technique makes it far easier to reason about code, but this is offset by the cost of keeping track of control flow in multiple classes. This can be so noticeable that I sometimes find the refactored code more difficult to reason about and more difficult to test, which defeats the purpose.

So I added the conditional, and the boolean parameter defaulting to False. The implementation became simpler. All function calls became substantially simpler. My tests became simpler. Because the function calls now expressed an intent rather than implementation details.

In particular, my tests went from “invoke the provided callback, compare state before and after” to “I expected the caller to tell me to clear the cache”. Obviously, just checking a single parameter is way easier. Adding one or more conditionals to a function does not significantly inflate its test suite.

The number of test cases scales roughly with the cyclomatic complexity of the function being unit-tested.

Adding a conditional usually doubles the cyclomatic complexity of a function. This is not much of an issue if it increases from 1→2, or 2→4. It is however very noticeable for higher complexity functions, where additional branching should be avoided – prefer extracting some of that into separate functions.

If a conditional branch terminates control flow (return, throw, …), then the cyclomatic complexity doesn't double but is merely incremented by one. This makes guard conditions at the beginning of a function very cheap from a complexity standpoint.

Other API patterns that get in the way of solving problems

I then realizes there was a more general design principle at play here. So many of our interfaces require the interface consumer to do something in a particular order or a particular way, just because our implementation happens to require it like that. This usually means the interface does not meet the real needs it was created for.

The classic offenders are:

  • init() methods that must be called before an object may be used. It is usually better to do the work in the constructor, or use a factory function instead. There are similar problems when other methods have to be invoked in a specific order, which is too error-prone to be called “good design”.

  • Initializing objects via setters. This makes it hard to be sure that the object is in fact in a valid state. A setter-based initialization is only suitable for behaviour-less, semantic-free records that just group some values together.

  • Lots of boilerplate. Simple things should be easy, not require you to instantiate a DefaultConfigurationFactory first. It is possible to define interfaces that can be used in a simple manner, while still having an escape hatch for more complicated use cases.

    In particular, implementing the main API as a simple convenience layer over a more flexible interface is a great way to do this. Optional parameters or overloaded methods are another great way to supply additional information without weighing down normal invocations with extra parameters.

I think I'll now have to add tightly coupled callbacks to this list as well.