Allocator Testing

2024-04-05 5 min by Lukas Atkinson Testing (6) C (2)

This post contains a bunch of ideas on testing custom allocators in C, assuming a single-threaded scenario.

Writing simple tests that allocate and free memory is a great idea. Simple tests cases are incredibly useful for getting started with writing an allocator, and for stepping through with a debugger. But reality is more complex, more complex than we can imagine. So writing test cases ourselves is very limited.

This is why techniques like fuzz testing and property based testing are so valuable: we only define how the computer creates test cases and what counts as a test failure, but then let the computer pick randomized inputs. This can be quite effective at discovering edge cases.

There are two aspects here:

generating "interesting" allocation patterns
detecting problems that count as test failures

Generating interesting allocation patterns

For a single-threaded allocator, we might start defining a randomized test by tracking an array of current allocations by a process.

void* allocations[MAXALLOCS] = {0};

We can now start with a simple test that randomly picks an allocation slot. Then:

If it is empty, we allocate.
If it is non-empty, we free it.

for (size_t round = 0; round < MAXROUNDS; round++) {
    size_t slot = randint(rng, 0, MAXALLOCS);
    if (allocations[slot]) {
        free(allocations[slot]);
        allocations[slot] = NULL;
    } else {
        allocations[slot] = malloc(SOMESIZE);
    }
}

for (size_t slot = 0; slot < MAXALLOCS; slot++)
    if (allocations[slot]) free(allocations[slot]);

I am assuming the existence of a suitable randint() function here. In any case, it is important to record the seed used by the RNG so that the sequence of allocations can be reproduced later for debugging. Randomized tests should still be deterministic!

This rough driver for a test gives us complicated allocation patterns and can be extended in many ways. For example, we should test the entire suite of allocator functions. Realloc in particular has many non-obvious modes.

Randomly select malloc()/calloc()/realloc() to allocate new memory.
Randomly select free() or realloc() to deal with existing allocations.

Instead of allocating elements of a static size, we vary that size.

It can be helpful to keep all allocations the same size for the duration of the test, but to run the test under different sizes.

Alternatively, we might randomly select a different size for each individual allocation.

We might pick those sizes from a static list, e.g. 0 bytes, 1 bytes, 16 bytes, 32 bytes, 1kB, 1MB, 50MB. Or we might sample a size from a random distribution. For allocators, it makes sense to have a very wide range of sizes. This can be achieved easily by sampling in log-space. Instead of sampling uniformly between 1B to 1KB which will feature many large allocations, we might sample uniformly between 0 and 10 and then left-shift, which will be biased more towards smaller sizes:

size_t size = 1 << randint(rng, 0, 10);

Detecting problems

We can test basic properties like "calloc() always returns zeroed memory". Or that all allocations (of non-zero size) should have unique addresses. Or more generally, that the allocated memory regions should not overlap. Tip: such checks become a lot easier when sorting allocations by their address, since the randomized approach does not rely on stable slot indices.

The most basic check is "it doesn't segfault". Because correct C is difficult to write, that is already a pretty useful property to test for, especially when dealing with complicated data structures like freelists.

But we may want to increase the likelihood of our tests uncovering a problem. To do that, our test driver should use the memory it has allocated, writing something to it. This allows us to check afterwards that the memory hasn't been accidentally touched by the allocator.

Example approach:

Extend the allocations array to also track the size of each allocation.
At each step where we encounter an existing allocation, verify that its entire memory region contains the expected pattern. If there is a mismatch, we have encountered a problem.
Whenever we allocate or reallocate memory, we fill the entire allocated memory with a pattern.

This raises the question what pattern to use.

All-zeros isn't a very good pattern because zeroes are widely used for legitimate purposes.
A repeating non-zero pattern like 0x55 (= 0b010_0101) or 0xAA is a good start.
The allocation's size or address are other useful candidates.
A very strong approach would be to seed an RNG with the allocation's address + size, and then fill the allocation with bytes from the RNG. This helps to generate patterns that are very unlikely to arise accidentally. Note that every hash function can be turned into an PRNG and vice versa.

Adding debug features to the allocator

So far, we've discussed how a program can check that the allocator is doing sensible things.

But we can also add similar safety checks to the allocator, verifying that the application doesn't exhibit use-after-frees, double-frees, or out-of-bound writes.

For example we can use that "fill memory with a checkable bit pattern" idea and flip it around to mark freed memory. So in debug mode, the free() function would overwrite the received memory with a suitable pattern. Later, when malloc() returns a memory chunk, it would verify that the chunk still contains that "not in use" marker pattern.

Allocators that use boundary tags or headers (like the ptmalloc/dlmalloc family to which glibc belongs) are especially vulnerable to out of bound writes like buffer overflows. This can be hardened:

In debug mode, add a marker at the start and/or end of the header. If the application accidentally overwrites this marker, the allocator can detect this easily. The above discussion of bit patterns applies here as well. Compare also the idea of "stack cookies" or "canary values".
Calculate a checksum for the header and store it somewhere. Maybe in the header itself, maybe in a separate table.

Steal from the best

There are many very robust modern memory allocators out there. Their tests and techniques may be worth adapting. Particularly relevant are

jemalloc
tcmalloc, which includes a multi-threaded variant of the above "randomly allocate/deallocate" idea.
mimalloc, which includes a stress test that implements many of the ideas discussed here (interesting allocation size distributions, checking allocation contents with a cookie). The test also does multithreaded checks, like randomly moving allocations between threads so that one thread allocates and a different one frees.
I don't recommend looking at glibc because (a) the coding style yields psychic damage and (b) the allocator isn't particularly modern compared to the linked alternatives, but its malloc implementation does feature some tests with similar ideas (randomized allocate/free actions).

next post: Is Python Code Sensitive to CPU Caching?
previous post: Cursed Syscalls to Set IO Priority in Python