MOPping it up

Building a simple Metaobject-Protocol

Many languages make a distinction between objects and classes. But shouldn't everything – including classes – be an object in a pure object-oriented language? Some languages like Smalltalk, Common Lisp, and Ruby manage just that and offer a complete and flexible Metaobject-Protocol. Others such as Java only offer a read-only object protocol commonly called “introspection” or “reflection”.

In this post, I'll explore how to create a simple Metaobject-Protocol where classes are objects – all without creating infinite loops. We'll be using a previous post's JavaScript object encoding as a basis for this MOP.

Different languages reify different concepts – that is, make these things available as first-class objects that can be stuffed into variables and be passed around. For example, Java8 reified methods so that a method can take another method as parameter. C reifies memory pointers. Lisp reifies the syntax. And some languages reify classes.

Python is one language where we can pass classes around and add or remove methods.

class Foo(object):
  def bar(self):
    return 'bar'

# add a 'baz' method
def baz(self):
  return 'baz'

Foo.baz = baz;

# call the added method
instance = Foo()
instance.baz()  #=> 'baz'

In JavaScript, we have something similar: Prototypes are a bit like classes, and prototypes are just arbitrary objects. We can add a new method like this:

function Foo() {
}
Foo.prototype.bar = function bar() {
  return 'bar';
}

// add a 'baz' method
Foo.prototype.baz = function baz() {
  return 'baz';
}

var instance = new Foo();
instance.baz(); //=> 'baz'

In our closure-based object encoding from the previous post, we hardcoded all possible methods into a switch. Here's how we'd write a 3D Vector:

function Vector(x, y, z) {
  return function self_(key, self) {
    self = self || self_;
    switch (key) {
      case 'x':
        return x;
      case 'y':
        return y;
      case 'z':
        return z;
      case 'as-string':
        return '(' + self('x') + ', ' + self('y') + ', ' + self('z') + ')';
      case '+': return function add(v) {
        return Vector(self('x') + v('x'),
                      self('y') + v('y'),
                      self('z') + v('z'));
      };
      default:
        throw 'Unknown key "' + key + '"';
    }
  };
}

Adding a method to the Vector is impossible without creating a subclass. And it isn't even a class – each Vector instance directly contains all methods.

From objects to classes

We'd like to separate “the thing that gets passed around” from “the thing that stores our methods”. A JavaScript object will do just fine for storing the methods – but note that while the above switch had a couple of statements for each case, a object must wrap these inside a function. This function will receive the self parameter.

If we factor out the methods, we'll get something like this:

var methods = {
  'as-string': function as_string(self) {
    return '(' + self('x') + ', ' + self('y') + ', ' + self('z') + ')';
  },
  '+': function add(self) {
    return function add(v) {
      return Vector(self('x') + v('x'),
                    self('y') + v('y'),
                    self('z') + v('z'));
    };
  },
};

function Vector(x, y, z) {
  return function self_(key, self) {
    self = self || self_;
    switch (key) {
      case 'x':
        return x;
      case 'y':
        return y;
      case 'z':
        return z;
      default:
        var method = methods[key];
        if (method) {
          return method(self);
        }
        else {
          throw 'Unknown key "' + key + '"';
        }
    }
  };
}

Ok, that didn't quite work – we still have all the properties x, y, z lying around. We'd like to somehow bundle them as well. We cannot put them into the methods dictionary since the methods will be shared between all instances of that class. We could put them into self, but that's a bit hacky and would make all data publicly accessible (violating encapsulation).

The solution I'd take is to add another parameter to the dispatch function that receives an object with all the data for our instance:

var methods = {
  'x': function x(self, data) {
    return data.x;
  },
  'y': function y(self, data) {
    return data.y;
  },
  'z': function z(self, data) {
    return data.z;
  },
  // note that `as-string` and `add` could directly access the data.x property,
  // but going through methods reduces coupling
  'as-string': function as_string(self) {
    return '(' + self('x') + ', ' + self('y') + ', ' + self('z') + ')';
  },
  '+': function add(self) {
    return function add(v) {
      return Vector(self('x') + v('x'),
                    self('y') + v('y'),
                    self('z') + v('z'));
    };
  },
};

function Vector(x, y, z) {
  var data_ = {
    x: x,
    y: y,
    z: z,
  };
  return function self_(key, self, data) {
    self = self || self_;
    data = data || data_;
    var method = methods[key];
    if (method) {
      return method(self, data);
    }
    throw 'Unknown key "' + key + '"';
  };
}

So, now we've started to draw a line between instances and their classes.

From classes to objects

In the above code, the Vector function isn't really a class – it's a constructor. I'd like to have an object that represents a class, and has a new method to construct objects. A first draft might look like this:

var class_data = {
  methods: methods,  // the methods object from earlier
};

var class_methods = {
  // create a new instance of the Vector class
  'new': function new_(clazz, clazz_data) {
    return function new_(x, y, z) {
      var data_ = {x: x, y: y, z: z};
      return function self_(key, self, data) {
        self = self || self_;
        data = data || data_;
        var method = clazz_data.methods[key];
        if (method) {
          return method(self, data);
        }
        throw 'Unknown key "' + key + '"';
      };
    };
  },
};

var Vector = function self_(key, self, data) {
  self = self || self_;
  data = data || class_data;
  var method = class_methods[key];
  if (method) {
    return method(self, data);
  }
  throw 'Unknown key "' + key + '"';
}

var v1 = Vector('new')(1, 2, 3);

Yeah, ok, all we've done is moving the problem one level up: what is the class of the Vector class? The Vector class is an instance of the Class class, obviously.

Going meta

If we want to avoid an infinite sequence of objects which have a class which is an object which has a class which is an object … we have to introduce an arbitrary end, or a fix point.

Introducing an arbitrary end means that there is one root class which is not an instance of any class. This is simple, but inelegant since this limits the expressiveness of our object system. Essentially, that's what Java is doing: objects are instances of classes, but classes aren't instances of anything.

The alternative, introducing a fix point, is far more difficult to pull off since we're introducing an infinite loop: some root class is an instance of itself.

There are various ways to do this. In Perl, every class is an instance of itself, in the sense that all methods called on an instance can also be called on their class. Or more precisely: class methods and instance methods are in the same namespace for each class.

This model is fairly simple to implement: The dispatch function self_ reads the same for instances and classes, except that the default value for the data parameter will be different. The disadvantage is that class methods are mixed with instance methods, and that our type hierarchy has multiple roots (except that everything implicitly inherits from the UNIVERSAL class, which is kind of an object root).

In Perl, each class is an instance of itself.

We won't do that, except for one root Class. All classes (including the Class class) are an instance of Class. All objects and classes are instances of some class, and all classes inherit from the Object class. Except Object itself, which has no base class.

The Object model we're aiming for will have these properties:

  • Object class is instance of Class class
  • Class class inherits from Object class
  • Class class is instance of Class class
  • Object instance has method CLASS
  • Class instance has method new to create an instance
  • Class instance has method extend to create a subclass
  • Class instance has method add-method to add an instance method

Class extends Object, but Object is-instance-of Class.

Method dispatch, and bailing out of it

The most difficult thing^[citation\ needed]^ to get right in such a MOP is doing method dispatch. In an early draft, I had a resolve-method method in each class which would search through the method table of each class, then the base class, until it reached the Object class which has no base. But it never actually got that far: To call resolve-method on a class, we first had to call resolve-method on the class's class, which quickly blew the stack. Dang.

I resolved this by getting rid of resolve-method. All dispatch logic lives in the self_ dispatch function (which means it can't be swapped out by this MOP, for now). This dispatch function is a closure over the object's internal data, and also over it's class. It can directly access its class's internal data, so that if the requested method is present in the immediate class, no further method calls are needed that would get us into the previous circular-dependency mess.

But if we don't find the method, we go look in the base class. This can now safely involve method calls on the class objects, since eventually we'll call some dispatch-related methods on the Class class, which will immediately contain these methods.

This is the really important point: At one level of our instance-class-metaclass hierarchy, method lookup must be possible without involving any further method dispatch.

Finally, if no method was found, we throw an error. My implementation does one more method call for the sake of better error messages, and tries to fetch the class's name. This can and will blow up if there is no name method at the highest level. So what we're going to do here is cross our fingers, hope for the best, and make sure this method is really there.

The below function new_dispatcher takes as parameters the class internals, the instance's data, and returns a new dispatch closure that works as discussed above. It requires three properties that have to be implemented by every class:

  • Each class must have a base property which either contains the base class, or a false value in the case of Object.
  • Each class must have a methods property that returns the method table.
  • Each class must have a name property, which returns either a class identifier for the purpose of better error messages, or a falsey value.
let new_dispatcher = function new_dispatcher(clazz, clazz_data, data_) {
  return function self_(selector, self, data) {
    self = self || self_;
    data = data || data_;

    // try to look up method in current class
    let method = clazz_data.methods[selector];
    if (method)
      return method(self, data);

    // if that fails, search base classes
    let base = clazz;
    while (base = base('base')) {
      method = base('methods')[selector];
      if (method)
        return method(self, data);
    }

    // Method not found
    throw 'Unknown selector "' + selector + '" for class ' + (clazz('name') || '<anon>');
  };
};

And it worked just fine. However, we cannot use this dispatch function for the Class class. Note that the new_dispatcher helper requires a clazz parameter which is the dispatch function of this class. But Class is its own class, so the dispatch function – which this helper is about to create – requires its own return value as parameter.

That obviously won't fly, but we can work around that by deferring access to the return value: We declare a variable self that will later hold the real dispatch function. We then define a function clazz that has the same interface as a dispatch function, but only proxies to self. This proxy therefore behaves equivalently to the real dispatch function, and no methods will be called on the proxy dispatcher before the real dispatcher is available. Then we can call new_dispatcher with the proxy dispatcher as class parameter, and assign the result to self. The circle is complete.

let new_meta_dispatcher = function new_meta_dispatcher(clazz_data) {
  let self;
  let clazz = function clazz(selector, self, data) {
    return self(selector, self, data);
  };
  let data = clazz_data;

  self = new_dispatcher(clazz, clazz_data, data);
  return self;
};

That was the difficult part; all that's left to do is implement Class and Object around these dispatch functions.

Implementing Class

First we'll implement the class methods. These methods will be available to all classes, including the Class class itself.

  • name returns the name from the class's internal data. Remember that this is needed to avoid infinite loops on method-not-found error messages.
  • base returns the base class from the class's internal data.
  • new creates a new instance of this class. As a simplification, we only handle a single constructor parameter which is an object. This object will be directly used as the internal data. We only make sure that this object will know what its class is.
  • methods returns the method table from the class's internal data.
  • add-method takes a method name and a function, and stores the function in this class's method table.
let Class_methods = {
  'name': function name(self, data) {
    return data.name;
  },
  'base': function base(self, data) {
    return data.base;
  },
  'new': function new_(clazz, clazz_data) {
    return function new_(args) {
      args.CLASS = clazz;
      return new_dispatcher(clazz, clazz_data, args);
    }
  },
  'methods': function methods(self, data) {
    return data.methods;
  },
  'add-method': function add_method(self, data) {
    return function add_method(name, body) {
      data.methods[name] = body;
      return self;
    }
  },
};

Then, we write down the Class private data. This will not be available to instances of Class. While the variables Class and Object must already be declared at this point, they weren't assigned their dispatch functions just yet – we'll fix that in a moment.

let Class_private_data = {
  name: 'Class',
  base: Object, // fix later
  CLASS: Class, // fix later
  methods: Class_methods,
};

Now it's time to build the fix point, the Class that is an instance of itself:

Class = new_meta_dispatcher(Class_private_data);

Implementing Object, and finishing bootstrapping

We have now arrived at a point where we can use rudimentary functions of our Metaobject-Protocol. For example, we can create a Class('new') and call it Object. Note that every class must have its own method table, which we'll pass as parameter.

Object = Class('new')({ name: 'Object', base: null, methods: {} });

We then Object('add-method')('CLASS', ...), so that a CLASS method is available to every object derived from this MOP, including the Class class itself.

Object('add-method')('CLASS', function (self, data) {
  return data.CLASS;
});

To finally make the MOP work, we have to fix two last properties: In the Class_private_data, the Class and Object variables weren't assigned. Let's correct that now.

Class_private_data.base = Object;
Class_private_data.CLASS = Class;

And that was it. This MOP is now fully operational.

We'll only add one teensy little method that makes subclassing easier:

Class('add-method')('extend', function extend(base, base_data) {
  return function extend(args) {
    return base('CLASS')('new')({ name: args.name, base: base, methods: {} });
  };
});

Now I guess you're like “OK, that was kinda cool, but what does a MOP buy me? I can already define methods…” Yes, but a MOP allows us to add new functionality to the object system itself.

Use the MOP, Luke – retrofitting properties

The bane of Java programming are the endless getter and setter methods you let your IDE autogenerate for you.1 This is a shame, because languages like C# have evolved beyond that and simply offer properties directly as part of the language. If Java had a MOP, we could retrofit support for properties.2 But alas, it doesn't, and we can't. In contrast, our JavaScript object encoding does, and we can, and we will.

We'll simply add a class method that takes a name as argument, and two booleans. If the first boolean is true, we add a getter, if the second is true, also a setter. As a naming convention, the getter will use the property name, while the setter will append an equals sign.

Showing the implementation is probably simpler than explaining it:

Class('add-method')('add-attribute', function (clazz) {
  return function (name, getter, setter) {

    if (getter) {
      clazz('add-method')(name, function (self, data) {
        return data[name];
      });
    }

    if (setter) {
      clazz('add-method')(name + ' =', function (self, data) {
        return function (value) {
          data[name] = value;
          return self;
        };
      });
    }

    return clazz;
  };
});

Note that both add-method, add-attribute, and the setter will return the invocant (aka. self) to facilitate method chaining.

Revisiting the Vector example with our MOP

A MOP is abstraction. And better abstraction leads to terser, more meaningful code. The Vector example at the beginning of this article uses 22SLOC with my indentation style. The MOP allows us to reduce that by almost 40%.

var Vector =
  Object("extend")({ name: 'Vector' })
  ('add-attribute')('x', true)
  ('add-attribute')('y', true)
  ('add-attribute')('z', true)
  ('add-method')('+', function (self) {
    return function (other) {
      return Vector('new')({
        x: self('x') + other('x'),
        y: self('y') + other('y'),
        z: self('z') + other('z'),
      });
    };
  });

Used as: Vector('new')({x: 1, y: 2, z: -3 }). Yeah, I must admit these parens start looking more and more like Lisp.

Limitations

The MOP constructed here is extremely simplistic, but it's the real deal. Metaobject-Protocols in real languages tend to offer many more features, some of which could be bolted onto this MOP.

Constructor protocol

For example, many MOPs offer a more advanced instance construction protocol. Note that this MOP doesn't even allow you to create custom constructors, and directly uses the constructor parameter for internal data, without any error checking! This MOP's construction protocol could be extended by swapping out Class('new').

Method protocol

The larger part of a CLOS-style MOP usually describes methods and method resolution. Limited languages like Java only allow you to override a method, mark a method as overridable, or require a method to be overridden in a subclass. In contrast, the CLOS (and in its style, Perl's Moose object system) offer many method combinators, such as overriding, wrapping, extending, before-invocation hooks, post-invocation hooks, …. This essentially allows us to do a kind of aspect-oriented programming. Assuming we had a resolve-method helper, a wrap-method combinator could be easily implemented as:

// Wraps an existing method, rather than merely overriding it
// Note: will not notice when any base class swapped out this method in the meanwhile
//
// Parameters:
// name: string
// wrapper: (self: Instance, data: Map, inner: () -> OrigType) -> OrigType
Class('add-method')('wrap-method', function (self) {
  return function (name, wrapper) {
    // orig: (self, data) -> OrigType
    let orig = self('resolve-method')(name);
    if (!orig)
      throw 'Unknown selector "' + name + '"';

    // override the method
    self('add-method')(name, function (self, data) {
      let inner = function inner() {
        return orig(self, data);
      };
      return wrapper(self, data, inner);
    });
    return self;
  };
});

Example usage:

let Frobnicator =
  Object('extend')({ name: 'Frobnicator' })
  ('add-method')('frobnicate', function (self) {
    console.log('frob frob frob');
  });
let FrobLogger =
  Frobnicator('extend')({ name: 'FrobLogger' })
  ('wrap-method')('frobnicate', function (self, data, inner) {
    console.log('Initializiong frobnication…');
    inner();
    console.log('Stopping frobnication…');
  });
FrobLogger('new')({})('frobnicate');

Property protocol

The model of properties (aka. attributes, fields) shown above is extremely crude. A fully-fledged MOP would have to resolve name clashes, handle property overriding, default values, constructor parameter names, initialization order, custom accessor names, possibly method delegation. How properties are handled has a lot to do with the MOP's initialization protocol.

Multiple inheritance, traits

This MOP assumes single inheritance, if only for the reason that I didn't want to implement the C3 algorithm for this article. Traits (or roles, mixins, partial classes, abstract classes interfaces) could be bolted onto this MOP (that's kind of the point of a MOP), but the MOP would have been architected slightly differently if it were to natively support these features.

Closed classes

This MOP leaves classes open to modification by everyone. This makes it terribly simple to violate encapsulation:

target('CLASS')('add-method')('spy', function (self, data) {
  return data;
});
var exfiltrated_data = target('spy');

Also, this has adverse performance consequences, since it's impossible to optimize something that may change any moment (at least without resorting to speculative optimization and deoptimization).

It's generally considered to be an acceptable tradeoff to close a class for modification once the first instance is about to be created.

Afterword

I wrote this post because I find language fundamentals interesting and was fascinated by the beauty of MOP design. It turned out to be the most difficult code I've written in some time, and it took me three days to get it just right. But once it's there, it's actually very little and fairly simple code.

If course, this would have been easier if I'd read The Art of the Metaobject Protocol, which I sadly do not have a copy of. If you want to do me a favor, feel free to change that :-)

Instead, I trawled through the source of the p5-mop project for inspiration, which was once a promising attempt to bring a performant MOP into the perl core.

In the upcoming articles, I'll probably talk about the Marpa parser: what it is, why you'd want to use it, and how to get started with it. Subscribe to the Atom feed to get notified when a new article gets published.


1Or maybe you actually enjoy writing Java without an IDE, you poor twisted masochist.

2Java has Project Lombok, which will hack into your IDE and let you use annotations in a manner that looks like a language extension, but really just means the IDE still performs code generation, though with the benefit that you never get to see this code.