Interface Dispatch

(15 min read)

Virtual method calls are simple: you just look up the method slot in a vtable and call the function pointer. Easy! Well, not quite: interfaces present a kind of multiple inheritance, and things quickly become complicated.

This post discusses interface method calls in C++ (GCC), Java (OpenJDK/HotSpot), C# (CLR), Go, and Rust.

It is an expanded version of my answer on Software Engineering Stack Exchange on Implementation of pure abstract classes and interfaces.

Disclaimer: Undefined Behavior and optimizations.

This article discusses PL concepts. The presented approaches are typical of language implementations, but not mandated by the language specification. In particular, clever optimizations might be able to remove some of the machinery that will be discussed here (in particular in the case of C#).

Vtables.

For a refresher on virtual methods and why they are important please read my article Dynamic vs. Static Dispatch.

In short, one of the important features of OOP is that a virtual method call doesn’t depend on the static type of a variable, but on the dynamic type of the object.

This requires that the information about the method implementations is stored within the object. This is typically done with a VTable (virtual method table). This is essentially a struct of function pointers.

If we desugar the above C++ code to C, the method call would actually look like this:

(Note that the object pointer is passed as an implicit this argument.)

Equivalently to a struct of function pointers we can think of an array of function pointers slots, where we have to know the slot index at compile time:

The object then points to a vtable that is filled with the correct method implementations for its dynamic type.

Typically the vtable is directly at the start of the object, so if we ignore the type system and UB (as the compiler can do) we get something like:

As a pointer diagram:

┯ object pointer
│
┏━━━┯━━━━━┓
┃   │ ... ┃ object contents
┗━┯━┷━━━━━┛
  │
  ┏━━━┯━━━┯━━━┓
  ┃   │   │   ┃ vtable
  ┗━━━┷━┯━┷━━━┛
        │
    DynamicType::method() implementation

It is important that all subclasses like the DynamicType still conform to the object and vtable layout of its base classes like StaticType. This is important for abstraction: the call site of a method can be compiled before the dynamic type even exists!

Multiple base classes? Multiple VTables.

With multiple inheritance, we have multiple base classes to which we have to stay compatible. And every base class layout expects that the vtable is right at the start of the object. Of course they can’t all be at the start! Instead, we put the extra vtable pointers in the middle. As a consequence, upcasting to a different base type means that we have to adjust the object pointer.

Let’s consider this C++ class hierarchy:

By running g++ -fdump-class-hierarchy we get this cryptic output:

Vtable for FirstBase
FirstBase::_ZTV9FirstBase: 3 entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI9FirstBase)
16    (int (*)(...))FirstBase::first_method

Class FirstBase
   size=16 align=8
   base size=12 base align=8
FirstBase (0x0x7fb98760c960) 0
    vptr=((& FirstBase::_ZTV9FirstBase) + 16)

Vtable for SecondBase
SecondBase::_ZTV10SecondBase: 3 entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI10SecondBase)
16    (int (*)(...))SecondBase::second_method

Class SecondBase
   size=16 align=8
   base size=12 base align=8
SecondBase (0x0x7fb98760c9c0) 0
    vptr=((& SecondBase::_ZTV10SecondBase) + 16)

Vtable for Derived
Derived::_ZTV7Derived: 6 entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI7Derived)
16    (int (*)(...))Derived::first_method
24    (int (*)(...))-16
32    (int (*)(...))(& _ZTI7Derived)
40    (int (*)(...))SecondBase::second_method

Class Derived
   size=32 align=8
   base size=28 base align=8
Derived (0x0x7fb9874c82a0) 0
    vptr=((& Derived::_ZTV7Derived) + 16)
  FirstBase (0x0x7fb98760ca20) 0
      primary-for Derived (0x0x7fb9874c82a0)
  SecondBase (0x0x7fb98760ca80) 16
      vptr=((& Derived::_ZTV7Derived) + 40)

This shows the class layouts and vtable layouts with all the offsets. Note that the vtables don’t just contain method slots, but two additional entries:

  • One entry points back to the vtable itself. This allows the dynamic type of an object to be determined at runtime (RTTI).
  • Another entry describes the offset from the upcasted this pointer to the original this pointer. It is necessary to fix up the object pointer when calling a method through the SecondBase if that method was implemented in the Derived class.

As a pointer diagram using a SecondBase static type:

              ┯ SecondBase pointer
              │
      ┏━━━┯━━━┯━━━┯━━━┯━━━┓
      ┃ V │ a │ V │ b │ c ┃ Derived object contents
      ┗━┯━┷━━━┷━┯━┷━━━┷━━━┛
        │       └───┐
        │           │
┏━━━┯━━━┯━━━┯━━━┯━━━┯━━━┓
┃ O │ T │ S │ O │ T │ S ┃ vtable
┗━━━┷━━━┷━┯━┷━━━┷━━━┷━┯━┛
          │           │
          │           Derived::second_method()
          │
          Derived::first_method()

V: vtable pointer
O: object offset
T: runtime type information (RTTI)
S: method slot

How does this solution score?

  • The vtable is part of the object, so a simple pointer to the object is sufficient to pass objects around.
  • However, this doesn’t scale well with many bases/interfaces: Each additional base class requires us to add another vtable to the object.
  • Upcasting is not free, and may change the object pointer. This can require some fixup later.
  • Virtual method calls require three layers of pointer indirection and no branches.
  • All of this involves a small, constant, but unavoidable overhead.

Java’s itables.

When Java was designed they looked at C++ and went “nah, this is too complicated” and made a single-inheritance language. But complexity can’t entirely disappear, it can only be shuffled around. Supporting interface inheritance is still a kind of multiple inheritance. So how was this solved?

Each Java object has an object header which includes a pointer to its class. This class serves as a vtable and is used during normal virtual method calls. When we upcast an object reference to an interface type no changes are made. So the JVM cannot use vtable dispatch for interface calls.

Virtual method calls only need the method slot index of the vtable.

Interface calls need the interface ID and the method slot index for the interface’s vtable, which is called an itable.

So how do we find the correct itable? In HotSpot, the runtime literally searches through an array of itables until one is found that matches the interface ID (source: interpreter, x86 assembler). As pseudocode:

So a call site would be compiled like:

A pointer diagram of this structure is a bit unwieldy, and might look like:

      ┯ object reference
      │
      ┏━━━┯━━━━━━┓
      ┃ V │ ...  ┃ object contents
      ┗━┯━┷━━━━━━┛
        │
┏━━━┯━━━┯━━━┯━━━┯━━━┓
┃ meta  │ S │ S │ S ┃ vtable / class structure
┗━━━┷━┯━┷━━━┷━━━┷━┯━┛
      │           └───────┐
      ┏━━━┯━━━┓           │
      ┃ V │ V ┃ itables   │
      ┗━┯━┷━━━┛           │
    ┌───┘                 │
    │                     │
┏━━━┯━━━┯━━━┯━━━┓         │
┃   │ S │ S │ S ┃ itable  │
┗━┯━┷━━━┷━┯━┷━━━┛         │
  │       ├───────────────┘
  │       │
  │      method() implementation
  │
  │ ┯ interface ID at callsite
  ├─┘
  │
  ┏━━━┯━━━┓
  ┃ meta  ┃ interface structure
  ┗━━━┷━━━┛ (defines itable layout)

Inline caching.

Of course running all that dispatch machinery takes much longer than a direct vtable lookup. In practice this doesn’t matter because the call site can remember the lookup result. Most of the time, all objects at a call site will have the same dynamic type. And because each object has a single pointer to the class we can make very cheap comparisons for the dynamic type.

In the simplest case, we assume that this call will have the same target as the previous call at that location. We can then write an optimized lucky path, and fall back to the expensive resolver when our assumption is wrong:

Here I’ve shown this with static variables, but usually a JIT compiler can hard-code the cached type and cached method pointers into the machine code. However, recompiling the callsite may be expensive so there will usually be a counter for cache misses before the call site is patched.

The Wikipedia article on Inline caching discusses this and related techniques in more detail.

How does this itable-based dispatch with inline caching score? We have to distinguish between worst case behaviour (on cache misses and during VM warmup), and typical behaviour.

  • Typically, we need one branch and two pointer indirections. Assuming good branch prediction, this is better than vtable dispatch.
  • Worst case, we have an unbounded number of pointer indirections and branches. More precisely, they scale linearly with the number of implemented interfaces of the dynamic type. At a minimum, there will be five levels of pointer indirection. Because the dispatch effort is unbounded, interface calls outside of carefully controlled class hierarchies might be unsuitable in real-time systems.

C#’s slotmaps.

C# generally follows the same approach as Java, but tries to perform more aggressive optimizations. For example, interface dispatch in C# should be understood primarily through inline caching (which the CLR calls Virtual Stub Dispatch). Finding the method in the vtable is only a strategy of last resort, and typically only happens during VM warmup.

C# has one very important difference to Java: how generics are implemented. In Java, type parameters are erased at run time. An ArrayList<Integer> is the same as an ArrayList<?> at run time. Not so C#: for each set of type parameters, the class structure is copied and sometimes specialized for those parameters. A List and List<int> are distinct types. The relationship between a specialized class and generic class is similar to inheritance. The result is that C# has a lot more class structures in memory.

The C# developers noted that Java-style itables and itable arrays would add up to a lot of storage, and looked for ways to compress them. In Java, each class that overrides an interface method needs to have it’s own itable copy. The CLR avoids this through more complicated vtable structures.

Instead of using ordinary itables that are vtables for interfaces, the CLR uses the concept of slot maps. These don’t hold a method pointer but a vtable slot index. The advantage is that the slot map doesn’t have to be rewritten when a method is overridden (or specialized for generics) – it’s sufficient to update the vtable. This also makes it more feasible to replace a method pointer with an optimized version, as only the vtable has to be updated.

The pseudocode for slot map based dispatch would look like this:

I won’t even attempt the corresponding pointer diagram :)

And as usual, the reality is way more complex than this description. If you are interested in more details, please read:

Fat pointers.

There has to be an interface vtable somewhere. C++ sticks it in the object, Java into the class. But there’s a third option: we can track the vtable with the object pointer:

┏━━━┯━━━┓
┃   │ V ┃ fat object pointer
┗━┯━┷━┯━┛
  │   └────┐
  ┏━━━━━┓  │
  ┃ ... ┃  ┏━━━┯━━━┯━━━┓
  ┗━━━━━┛  ┃   │   │   ┃
  object   ┗━━━┷━━━┷━━━┛
             vtable

The vtable in the fat pointer always contains the appropriate vtable layout for the current static type. This has a number of interesting consequences. For example, virtual calls now require one less pointer dereference. Interface calls are now efficient enough that we do not need inlining to get bearable performance (might still be helpful, though).

A game-changing advantage is that interfaces can also be implemented externally from the object. The interfaces don’t have to be known up front when a class is implemented. As the object layout is unaffected, we can even implement interfaces for POD types and primitive types such as unboxed integers! (Casting to an interface would effectively box the primitive type.) And in principle, it would be possible to provide multiple differing implementations for each type/interface combination, with one implementation being selected at the site of an upcast.

So it’s no surprise that this strategy is used e.g. for Go interfaces, for Rust dyn Traits, and to some degree in Haskell. The ability to implement interfaces externally corresponds well to Haskell’s typeclasses.

If this is so brilliant, why isn’t everyone using it? There are three general problems: fat pointers aren’t normal pointers, upcasting can be expensive, and this can complicate a compiler.

Fat pointer size overhead.

A fat pointer needs a vtable pointer in addition to the data pointer, and is therefore twice as big. To move an object reference around we now need more than one register. To store an object reference we need two machine words, so e.g. an array that contains interface references is now twice as big. But in that array many elements are likely to have the same concrete type, therefore wasting a lot of space for little gain. Under this assumption, approaches as in C# or Java can be a lot more memory-efficient and therefore also more cache-efficient.

Fat pointer conversion overhead

When a concrete type is upcasted to an interface type, this is now no longer a free operation (in Java this is a no-op, in C++ this is at most a pointer offset). Instead we need to find the correct vtable. So part of the cost of dynamic dispatch has been moved from the call site to the cast site. This is not usually a big problem because the concrete type is statically known at the cast site so the correct vtable can be found at compile time.

Things are a bit more interesting when casting between interfaces (e.g. in Go) or when combining multiple interfaces/traits/typeclasses. There is no elegant solution.

Rust currently disallows trait combinations in trait objects. As a workaround, you’d have to introduce adapter types that wrap the object, or add a new trait that describes your use.

Go calls back into the runtime library for interface casts, which looks up the itable in a global hash table, and creates a new itable if none was found in the hash table. To construct a new itable the methods are looked up by name in the concrete type’s global vtable. This is quite flexible but implies a horrendous runtime cost, even on the “happy path” where the itable already exists. As soon as interfaces are involved, Go is more like Python than C or C++.

Compiler implementation considerations.

Finally, fat pointers can complicate the design of a compiler.

First, there’s the ABI concern that interface types (objects) and normal pointers have different sizes. This is usually not a problem, but e.g. means that this strategy is not suitable for C++.

Due to the pointer size concerns, we might want to use normal vtable dispatch as in C++/Java for class inheritance and fat pointers for interface inheritance. Now the compiler needs to manage two completely independent types of objects. However, this is already the case in Java as interface calls use a different VM opcode. (In fact the Java specification explicitly anticipates implementations using fat pointers.)

Another concern is how reflection and run time type information can be managed, if the object no longer contains a vtable that can be used to identify its type. It’s correct that this no longer allows us to take an arbitrary pointer and find out which type of object it references. But that also shouldn’t be necessary: either we have a simple pointer to a statically known type, or we have a fat pointer. The vtable of the fat pointer can contain additional information, such as a reference to the concrete type. This can be used for downcasting back from a reference type to a concrete type.

Conclusion.

We have looked at various ways to resolve interface calls, which is a kind of multiple inheritance: storing multiple vtables per object as in C++, storing itables in the class structure as in Java or C#, or storing itables as part of a fat pointer as in Go or Rust.

None of these solutions are inherently better than the others. They all add some runtime overhead (space or time), and they all have consequences for the language semantics that can be expressed. In some cases the overhead can be amortized through caching.

A whole class of languages has been ignored here: those that do method lookup by name, e.g. Python. That relaxes the constraints on vtables because named slots don’t have to stay in a baseclass-compatible order. However name-based dispatch plays in a different league performance-wise and is of no interest in the context of vtable-based dispatch.