Projects

There are three kinds of personal open source projects:

  • projects that address a real problem,
  • projects that were made to be put on a portfolio, and
  • projects that serve to facilitate learning.

Most of my projects land in the “real problem” or “learning” category, but I dislike publishing them unless they are presentable.

Following through with proper execution is hard. Most effort in a software project isn’t the programming, but the research, testing, documentation and packaging that’s necessary to move from a throwaway hobby project to a shippable product others can use.

Overview:

  • Async::Trampoline (Perl, C++, 2017)
    Write trampolined functions with pseudo-async/await syntax.
  • Util::Underscore (Perl, 2014)
    Common helper functions without having to import them.
  • Dist::Zilla [@Author::AMON] plugin bundle (Perl, 2017)
    My dist.ini preferences for dzil.
  • Multijob (Python, Shell, C++, Go, 2017)
    Generates a list of jobs to be run in various configurations, and deserializes those configurations from the command line.

For a full list of my public projects, take a look at my GitHub.

Async::Trampoline (Perl module)

This module makes it easier to write trampolined functions: each function returns a thunk or continuation that will be invoked until a final result is produced. This allows us to implement funky control flow, like Python-style lazy generators or tail recursion. I use this module for a few handwritten recursive-descent parsers and interpreters that deal with recursive structures.

This module started as a small function in a hand-written parser, but it turned out that the code would spend more time in the trampoline dispatch loop than doing useful work (I profiled this carefully).

So I rewrote it in XS, Perl’s C language binding. That helped a bit, but using the Perl data structures through C is not more efficient than using them through Perl. Also, writing an ad-hoc C program is fairly error-prone.

Now that I knew what I had to program, I decided to rewrite this code as a separate module and avoid the Perl API for the trampoline dispatch loop. Initially I tried writing it in C, but writing correct C code is very very difficult, especially around correct ownership tracking. I also found that I needed to re-implement many fundamental data structures. While it’s often not difficult writing these data structures, it is difficult to test this code carefully. I finally stopped once I found myself researching fast hash functions for a custom hash table implementation. (For the record, I settled on FNV as a likely candidate.)

So I decided to rewrite the code again, this time as C++. The better type system and especially RAII allowed me to write much more robust code than with C. The standard library is a big help. Refcounting in the internal object graph became a non-issue since that can be handled automatically with RAII. I still had a few bugs and segfaults, but in fixing them I learned lots about GDB and Valgrind.

To my surprise, a good part of CPAN Testers supports C++11 compilers. Most modern features are broadly available since GCC 4.7. However, the Perl toolchain is not very C++-friendly. E.g. your XS code must catch all exceptions manually. And XS in general is a bit of a minefield: The docs are very sparse. You have to carefully refcount all Perl objects. And when you peek at the toolchain source code, it looks like the kind of write-once Perl you’re always warned about.

After all this effort, the trampoline performs adequately. Not very good since calling Perl subroutines is fairly costly, but it’s probably not going to get much faster than this. Along the way I learned a ton about Perl internals, C, C++, and software packaging.

Util::Underscore (Perl module)

This module is a collection of various helper functions. It solves two problems: (1) I like fully qualified subroutine names to avoid namespace pollution. (2) I don’t like having to remember which util module contains a given function. Util::Underscore therefore aliases them into the _ namespace so that they can be invoked like _::any { $ > 0 } @nums. There are also a couple of new additions like _::croakf $pattern, @args as a shorthand for _::croak sprintf $pattern, @args, and various safe type checks like _::is_array_ref or _::can $object, $method.

I enjoy this module for personal projects and for one-liners, because it makes a large toolbox of functions immediately accessible. But for published projects, adding this large dependency is probably not the best idea :-)

The code for this module is not spectacular since it mostly consist of aliases. However, I expended significant effort to carefully document and test all functions. Where I just aliased the functions the tests only cover basic cases, but should be enough to prevent accidental breakage when an upstream module changes. The documentation is rather formal, and tends to surpass the original docs in quality. My experience writing these docs has lead to the blog post Good API documentation.

Dist::Zilla::PluginBundle::Author::AMON (Perl module)

After having published Util::Underscore and Async::Trampoline, I got tired of re-assembling a sane dist.ini each time. This plugin bundle provides an opinionated starting point that lets me publish a new Perl module more quickly. For example, it renders the main documentation as a Markdown readme for GitHub, and adds a number of extra checks and helpers. (Verifying that I wrote a changelog? Yes please!)

Somewhat disturbingly, this plugin ships with no tests beyond that it is able to publish itself. I therefore don’t recommend other people use this directly.

Multijob (Python module)

As part of evolutionary algorithm (EA) research, it is often necessary to observe the behaviour of an EA over a range of parameters in order to find a suitable configuration or in order to compare different EAs. That is a massively parallel problem.

The corresponding academic paper presents a workflow to manage the distributed tasks with GNU Parallel. This workflow is also explained in the Multijob tutorial.

The Multijob software assists in this workflow with generating the necessary job definitions. It also contains libraries for Python, C++, and Go to decode the job definitions, which allows the EAs to be be implemented in any of these languages.

During this project, I familiarized myself with a Python toolchain (packaging, dependency management, testing, and documentation).

Unpublished projects

Various other unpublished or unreleased projects include:

  • The Perl module MarpaX::Grammar::Preprocessor, a syntax extension for Marpa::R2 SLIF grammars. Status: mostly ready for a release, but requires more design for a blocking feature, some refactoring, and another quality focus. It should already be usable as alpha software.

  • The Perl module MarpaX::DSL::InlineActions a predecessor or MarpaX::Grammar::Preprocessor. It was a far more ambitious project to offer a completely separate interface to Marpa::R2, but I lost interest while I was trying to port it to the low level Marpa API. It should be somewhat usable for prototypes, but not for production-grade parsers.

  • A text templating language, roughly imitating Liquid. It was designed with cross-language compatibility in mind with a reference implementation in Perl, but is currently on my backlog. The MarpaX::Grammar::Preprocessor was a spin-off from this project. While working on this project, I realized I needed a deeper understanding of language implementation techniques. I will probably start a rewrite within the next couple of years.

  • A lightweight markup language, similar to Markdown but with a richer semantic model and with extensibility in mind. Markdown (as standardized by CommonMark) lacks many crucial features which I need for nontrivial articles on this website or as a LaTeX substitute in my studies. I currently use Pandoc which offers many missing features like tables and mathematics, without addressing the underlying issue around extensibility. I tried prototyping the semantic model in Perl, C++, and Haskell, but have now made the most progress with Python.

  • Some toy languages to learn about interpreter implementation. I used Perl and Golang as the host languages.

If you want to take a look at these projects or need some help with them, please contact me.