Emerging Objects

Building a simple object system out of closures

Object-oriented programming and functional programming imply each other. While encoding closures as objects is a well-known technique (see the command pattern, and e.g. Functors in C++), using closures to implement objects is a bit more unusual.

In this post, I will explore creating a simple object system in JavaScript, using only the functional parts.

Closures

A closure is a function that has access to the variables in the surrounding scope. For example, methods in Java have access to member variables:

class Foo {
  final int variable;

  public Foo(int x) {
    this.variable = x + 2;
  }

  int closure() {
    return var;
  }
}

...

Foo gimme12 = new Foo(10);
Foo gimme42 = new Foo(40);

System.out.println(gimme12.closure());  // 12
System.out.println(gimme42.closure());  // 42

So the closure method is a closure. In JavaScript, we can nest functions. There, an inner function has access to all the variables of the outer function:

function Foo(x) {
  var variable = x + 2;
  function closure() { return variable }
  return closure;
}

var gimme12 = Foo(10);
var gimme42 = Foo(40);

console.log(gimme12()); // 12
console.log(gimme42()); // 42

In that code example, we defined an inner function and returned that inner function. When we call the inner function, it has access to the variable it was defined under. The gimme12 closure was defined when variable = 12, and calling Foo(40) didn't change that: Each time Foo is executed, we get a new variable, so each closure was defined in the context of a different variable.

Closures as data structures

So we've seen that we can put data inside a function, and get it out again by using a closure over that data. This allows us to write simple (and later less simple) data structures.

Let's implement a Pair(x, y) constructor, which returns some opaque value. The function head(pair) will return the first element, the function tail(pair) will return the second element.

The simplest solution is to put x and y into an array, and have head and tail access the appropriate item in that array:

function Pair(x, y) {
  return [x, y];
}

function head(pair) { return pair[0] }
function tail(pair) { return pair[1] }

var first_pair = Pair(7, 11);
var second_pair = Pair("foo", "bar");

console.log(head(first_pair));  // 7
console.log(tail(second_pair)); // "bar"
console.log(tail(first_pair));  // 11

But we can do this without using an array! The trickery is that the value returned by Pair is a closure over the values of the pair. All that head and tail do is call that closure with an agreed-upon key to get a value back:

function Pair(x, y) {
  // return a closure
  return function (i) {
    if (i == 0)
      return x;
    else
      return y;
  };
}

function head(pair) { return pair(0) }
function tail(pair) { return pair(1) }

var first_pair = Pair(7, 11);
var second_pair = Pair("foo", "bar");

console.log(head(first_pair));  // 7
console.log(tail(second_pair)); // "bar"
console.log(tail(first_pair));  // 11

Of course, once we have pairs, we can build virtually any data structure from that (see Lisp for inspiration…).

In the above code, we used two external functions head and tail to interface with the closure. However, we could have also made that closure part of the public interface. Let's say that when passed the string "head", it will return the first element, and the second element when passed the string "tail":

function Pair(x, y) {
  return function (key) {
    switch (key) {
      case "head":
        return x;
      case "tail":
        return y;
      default:
        throw "Unknown key \"" + key + "\"";
    }
  };
}

var first_pair = Pair(7, 11);
var second_pair = Pair("foo", "bar");

console.log(first_pair("head"));  // 7
console.log(second_pair("tail")); // "bar"
console.log(first_pair("tail"));  // 11

So, we can ask the closure for some piece of data. It will either respond to that data, or throw an error if it can't handle the request. We've kinda already invented methods here.

Methods

Until now, we have used a closure to map keys to closed over variables. We can also use this closure to return other closures. These closures will also remember their environment and therefore have access to all variables in their scope.

Now instead of a Pair, let's look at a 3D Vector as an example. First, we write a simple constructor as with the Pair example that returns a closure which lets us access the fields:

function Vector(x, y, z) {
  return function(key) {
    switch (key) {
      case "x": return x;
      case "y": return y;
      case "z": return z;
      case "as-string":
        return "(" + x + ", " + y + ", " + z + ")";
      default:
        throw "Unknown key \"" + key + "\"";
    }
  };
}

var v = Vector(1, 2, 3);
console.log("The Vector v=" + v("as-string") +
            " has an y-component of y=" + v("y"));

As an added bonus, this here has an as-string-property which gets calculated on the fly.

Now we would like to add some vectors. In vanilla JavaScript, we might add a method add so that v1.add(v2) sums the two vectors. To access the method, we would do v1("+") in our encoding encoding of objects, which would return a closure which we can then apply to v2: v1("+")(v2). The implementation isn't exactly spectacular:

function Vector(x, y, z) {
  return function(key) {
    switch (key) {
      case "x": return x;
      case "y": return y;
      case "z": return z;
      case "as-string":
        return "(" + x + ", " + y + ", " + z + ")";
      case "+": return function (v) {
        return Vector(x + v("x"),
                      y + v("y"),
                      z + v("z"));
      };
      default:
        throw "Unknown key \"" + key + "\"";
    }
  };
}

var v1 = Vector(1, 2, 3);
var v2 = Vector(0, -2, 2);
var result = v1("+")(v2);   // Vector(1, 0, 5)
console.log(v1("as-string") + " + " + v2("as-string") +
            " = " + result("as-string"));

So, adding methods isn't exactly rocket science. The syntax is awkward, but it isn't actually that different from normal notation.

Subclassing

One feature commonly associated with object-oriented programming is the ability to subclass an existing class in order to add more methods. Let's say that I want to subclass that Vector so that it provides a scalar multiplication method ·.

Note that the closure returned by the constructor tries to match a number of known method names. If no match is found, an error is thrown. Our subclass will modify that: If no method is found in the subclass, we continue looking for methods in the base class.

function VectorWithMultiplication(x, y, z) {
  var base = Vector(x, y, z);
  return function (key) {
    switch (key) {
      case "·": return function (v) {
        return  base("x") * v("x") +
                base("y") * v("y") +
                base("z") * v("z");
      };
      default: return base(key);
    }
  };
}


var v1 = VectorWithMultiplication(1, 2, 3);
var v2 = VectorWithMultiplication(0, -2, 2);
var result = v1("·")(v2);   // = 0 + -4 + 6 = 2
console.log(v1("as-string") + " · " + v2("as-string") +
            " = " + result);

So subclassing isn't spectacular: If our object doesn't find a matching field or method, we let someone else continue.

Another way to factor this code would be to express the subclassing as a decorator that intercepts all method requests. This decorator would take a fully instantiated object as parameter, and dynamically add another method:

function VectorMultiplicationDecorator(base) {
  return function (key) {
    switch (key) {
      case "·": return function (v) {
        return  base("x") * v("x") +
                base("y") * v("y") +
                base("z") * v("z");
      };
      default: return base(key);
    }
  }
}

The resulting object has an identical API, save for instance construction.

What's this all about?

Some methods rely on other methods to perform their work. So we need a way to call methods on our current object. Previously, we returned an anonymous closure to represent our object. But if we simply give it a name (such as self), then we can reference it.

The previous example did not run into this problems because all invoked methods were in the base class. If we want to do method dispatch properly, the above example would be written like this:

function VectorMultiplicationDecorator(base) {
  return self;
  function self (key) {
    switch (key) {
      case "·": return function (v) {
        return  self("x") * v("x") +
                self("y") * v("y") +
                self("z") * v("z");
      };
      default: return base(key);
    }
  }
}

This now allows us to call methods on the same class. But there still is a problem if we want method resolution to go through all applicable subclasses as well. For example, the strategy pattern/template method pattern lets us override steps in an algorithm. Let's say our algorithm is about making pancakes.

function PancakeStrategy() {
  return self;
  function self(key) {
    switch (key) {
      case "make-pancake":
        self("prepare-dough");
        self("bake-pancake");
        self("spread");
        self("eat-pancake");
        return;
      case "prepare-dough":
        console.log("Mixing wheat flour, milk, and eggs");
        return;
      case "bake-pancake":
        console.log("The pancake is sizzling in the pan");
        return;
      case "spread":
        console.log("Applying chocolate spread");
        return;
      case "eat-pancake":
        console.log("Yummy!");
        return;
      default:
        throw "Unknown key \"" + key + "\"";
    }
  };
}

PancakeStrategy()("make-pancake");
// "Mixing wheat flour, milk, and eggs"
// "The pancake is sizzling in the pan"
// "Applying chocolate spread"
// "Yummy!"

Now today I don't want sweet pancakes, but savoury pancakes with bacon. I might get the idea to simply override the prepare-dough and spread methods. Let's try it:

function BaconPancakeStrategy() {
  var base = PancakeStrategy();
  return self;
  function self(key) {
    switch (key) {
      case "prepare-dough":
        base("prepare-dough");
        console.log("Sprinkling in some bacon bits");
        return;
      case "spread":
        console.log("Spreading sour cream");
        return;
      default:
        return base(key);
    }
  };
}

BaconPancakeStrategy()("make-pancake")

Unfortunately, we get the same pancakes as before. We forgot to let make-pancake look in the subclass if there is one. To do this, we need to pass self to the base when doing method resolution.

The simplest way to do this is to insert the self as an additional parameter. This requires classes to explicitly support proper subclassing. A class can always choose to only dispatch to methods that itself or its base classes define.

In the below code, I'll give each self function an optional second parameter that can override the function to use for self-dispatch.

function PancakeStrategy() {
  return _self;
  function _self(key, self) {
    // if no parameter is provided, substitute ourself
    self = self || _self;
    switch (key) {
      ... // as before
    }
  };
}

function BaconPancakeStrategy() {
  var base = PancakeStrategy();
  return _self;
  function _self(key, self) {
    self = self || _self;
    switch (key) {
      ... // as before
      default:
        return base(key, self);
    }
  };
}

BaconPancakeStrategy()("make-pancake")
// "Mixing wheat flour, milk, and eggs"
// "Sprinkling in some bacon bits"
// "The pancake is sizzling in the pan"
// "Spreading sour cream"
// "Yummy!"

Yay, this works!

So, now we've got a crude way to implement a simple object system supporting

  • subclassing
  • methods
  • read-only properties (read-write properties are more complex in this encoding)
  • open recursion/this

Our object encoding differs from most common object systems in a couple of important ways. In some languages (including JavaScript), objects are merely glorified structs or hash tables. This makes it difficult to have truly private data. Our objects-as-dispatch-function mechanism allows objects to freely respond to messages as they see fit. For example, autoloading of methods is possible with this mechanism (see e.g. the above VectorMultiplicationDecorator example). This puts our encoding more in line with the Smalltalk tradition of OOP, which today lives on in Objective-C.

Smalltalk has a very interesting feature that we don't have: a metaobject-protocol (MOP). A MOP encapsulates object construction and method dispatch in a meta-circular manner. Whereas we always had to spell out all necessary implementation details (such as throwing an exception when no method was found), a MOP handles all of that for us.

In a future post, I'll discuss how to MOP up this object system. Subscribe to the Atom feed to get notified as soon as it gets published.