Perl Docstrings: Put your POD into Heredocs

Python's docstrings are great, but have no real equivalent in Perl. However, we can make POD sections easily accessible to Perl code by putting the POD into heredocs:

my $some_function_docs = <<'=cut';

=head2 some_function

Write documentation here.

=cut

This article explains why that works, and how this might be used in your Perl modules.

I've started thinking about Perl/POD integration a few weeks ago because I wanted to display better error messages. How cool would it be if the error wouldn't just tell you what was wrong, but also a piece of documentation that tells you how to fix it? And how cool would it be to get a perldoc perldiag style searchable overview of all error messages in your module?

Initially, I had a module that contained all diagnostic messages. Every function would format an error message, and had a POD section to explain more infos:

=head2 ParseError: Redefinition of "<symbol>"

The symbol was already defined in the current scope.

A symbol is defined by:

=over

...

=back

Consider renaming one of the definitions.

=cut

sub redefinition_of_symbol {
  my ($location, $prev_location, $symbol) = @_;
  return join "\n" =>
    qq(ParseError: Redefinition of "$symbol"),
    $location,
    qq(previous definition here:),
    $prev_location;
}

This is not ideal, because the name of the error message was now repeated in multiple places. If I updated the error message itself, the corresponding POD documentation could go out of sync. Also, the actual error message is fairly hard to read. So I started looking for a better way.

Python has a great feature called docstrings. A docstring is a special string containing documentation that is associated with a class or function. The docstring is then accessible via reflection, i.e. some_function.__doc__. This is used by the REPL help() system, the perldoc-style pydoc commandline help tool, and by the Sphinx HTML documentation generator.

Perl has POD documentation. POD is completely different from docstrings, because POD is not really integrated into the Perl language. Instead, POD and Perl ignore each other:

  • a Perl file can be executed as Perl code which ignores POD sections.
  • a Perl file can be rendered as a POD document which ignores anything outside of POD sections.

So essentially, every Perl/POD file is a polyglot program.

POD Sections start with any POD directive and end with the =cut directive. Directives match the regex /^=\w+/m, i.e. start with an equals sign in the first column of a line.

Now I mentioned that Perl ignores any POD sections but that is not entirely true. Perl does not process the contents of string literals. We can therefore have a string or here-doc that contains POD:

my $pod = <<'POD';

=pod

This will be rendered as POD

=cut

POD

this_is_normal_perl_code();

Perl's heredocs have the peculiarity that the end marker may be any (single-line) string, and need not be a legal Perl identifier. The only requirement is that the end marker is found at the start of a line. For example, this is a valid though very misleading here-doc:

my $pod = <<'exec "foo" if $bar;';
some content
exec "foo" if $bar;

this_is_insane_perl_code();

While that particular example has no applications outside of obfuscation contests, we can use the POD end marker =cut as the heredoc end marker: Both must be at the start of a line, on a line of their own:

my $pod = <<'=cut';

=pod

This will be rendered as POD

=cut

this_is_clever_perl_code();

I have done this to use the POD source as a template for my error messages:

my $redefinition_of_symbol = render_pod(<<'=cut');

=head2 ParseError: Redefinition of "<symbol>"

<location>

previous definition here:

<prev_location>

The symbol was already defined in the current scope.

A symbol is defined by:

=over

...

=back

Consider renaming one of the definitions.

=cut

sub redefinition_of_symbol {
  my ($location, $prev_location, $symbol) = @_;

  my %vars = (
    symbol => $symbol,
    location => $location,
    prev_location => $prev_location,
  );

  return $redefinition_of_symbol =~ s/<(\w+)>/$vars{$1}/gr;
}

Where render_pod() strips the markup from the string:

sub render_pod {
  my ($source) = @_;

  my $processor = Pod::Text->new;
  $processor->output_string(\my $text);
  $processor->parse_string_document($source);

  return $text;
}

This is already great for error messages, but now the rendered POD documentation will contain placeholders for the exception details. To exclude them from normal POD rendering, we can put those into a data paragraph. A POD data paragraph is a region that is usually excluded from rendering. A data paragraph is enclosed within =begin ... =end directives, or is a paragraph started by a =for directive. Each data paragraph type has a name. If the name starts with a colon, the contents are processed as normal POD if the data paragraph is rendered.

We can therefore define an :errormsg data paragraph type that is usually hidden from the POD document, but not when we process the POD for our error messages.

It is not entirely clear to me why this works (the Pod::Simple source certainly isn't an example of readable Modern Perl), but it seems we can subclass Pod::Text to implement our POD dialect:

package Pod::ErrorMessage;

use parent 'Pod::Text';

sub new {
  my $class = shift;
  my $self = $class->SUPER::new(@_);
  $self->accept_targets_as_text('errormsg');  # render "=for :errormsg"
  return $self;
}

Additionally, we can apply further processing like turning the POD headline into an ordinary paragraph, and removing extra indentation:

sub render_pod {
  my ($source) = @_;

  # turn =head2 into ordinary paragraph
  $source =~ s/^=head2\s+//mg;

  # remove extra leading space
  $source =~ s/\A\s+//;

  # parse the POD. Prepend "=pod" to force all paragraphs to be used.
  my $processor = Pod::ErrorMessage->new;
  $processor->output_string(\my $text);
  $processor->parse_string_document("=pod\n\n" . $source);

  # remove indentation, as taken from the first paragraph
  if ($text =~ /\A(\h+)/) {
    my $indent = $1;
    $text =~ s/^\Q$indent\E//mg;
  }

  # join first paragraph into single line
  pos($text) = 0;
  1 while   # repeat while regex matches
    $text =~ s/
      \G                # start where prev iteration left off
      (?:(?!\R).)*+ \K  # skip over line contents
      \R \h*+           # remove line break and leading space
      (?!\R)            # if the line isn't empty
    //xsg;

  return $text;
}

# To the extent possible under law, Lukas Atkinson has waived all copyright and
# related or neighboring rights to the render_pod() subroutine.
# This work is published from: Germany.
# See <https://creativecommons.org/publicdomain/zero/1.0/> for precise terms.
# See <https://lukasatkinson.de/2017/perl-docstrings-put-your-pod-into-heredocs>
# for further information about this work.

Now, we can define and render our error message like this:

my $redefinition_of_symbol = render_pod(<<'=cut');

=head2 ParseError: Redefinition of "<symbol>"

=begin :errormsg

<location>

previous definition here:

<prev_location>

=end :errormsg

The symbol was already defined in the current scope.

A symbol is defined by:

=over

...

=back

Consider renaming one of the definitions.

=cut

sub redefinition_of_symbol {
  my ($location, $prev_location, $symbol) = @_;

  my %vars = (
    symbol => $symbol,
    location => $location,
    prev_location => $prev_location,
  );

  return $redefinition_of_symbol =~ s/<(\w+)>/$vars{$1}/gr;
}

If rendered as an error message, we will see something like

ParseError: Redefinition of "foovar"

test:20: let foovar = 42

previous definition here:

test:4: let foovar = 3

The symbol was already defined in the current scope.

A symbol is defined by:

    ...

Consider renaming one of the definitions.

If rendered as the POD documentation, the :errormsg section is ignored and we get:

ParseError: Redefinition of “<symbol>”

The symbol was already defined in the current scope.

A symbol is defined by:

...

Consider renaming one of the definitions.

Is this an elegant solution? I'm not entirely convinced. On the one hand, it's great that the error explanation and error template now share the same source. On the other hand, this also illustrates how weird and inconvenient POD documentation can be.

In any case, I hope this inspired you to create great error messages yourself and possibly use similar POD-mangling techniques when using Perl.