Why Document

Documentation provides real value. Without docs, it would often be easier to implement the required functionality myself rather than trying to understand a library from its source code. That is, a library without documentation is a leaky abstraction, since I have to peek under the covers if I’m not sure what it’s supposed to do. Which happens fairly often.

Self-documenting code is no substitute for real documentation, since they target different audiences. Documentation is read by people who want to use your software. The source code is read by other programmers on the project. Those other programmers will still enjoy documentation when it is available, since it allows them to more quickly build a complete understanding of the whole software system without actually having to understand every single line of code.

Writing good documentation takes a lot of time (roughly as much as coding and testing together), so we want to write as little of it as possible. This article contains various advice that will help you write documentation more efficiently, by prioritizing important information. You can always go back and add more details later.

Kinds of Documentation

Documentation is not complete when you have Doxygen-style reference documentation for all your methods. Useful documentation contains material on both a high overview level and a more detailed reference level. It accommodates both people new to the software and experienced users that only read the docs to extract a very specific piece of information without having to resort to reading the source code.

In my experience, stellar docs contain these parts:

  • # Introductions and Overviews explain what this software is capable of and why it is awesome. This is less a kind of documentation than it is marketing material, so it should be an inviting entry point for potential users.

  • # A Concepts page lists and explains unique concepts, ideas, and architectural decisions of the software in broad strokes. E.g. a static blog generator might introduce a “page” and “post” concept here. However, the rest of the docs should still be readable without intimate knowledge of these concepts. Treat the concepts page as a glossary and link to the corresponding entry when you first use a special term in any section.

  • # Tutorials and How-Tos walk an user through tasks or scenarios, with the aim of showcasing correct and easily adaptable code. Do write production-ready examples with full error handling and follow all applicable best practices etc., but don't get lost in tangential details. For those, link to the reference documentation.

  • # Examples can serve as a good entry point or overview documentation for readers like me. See the # Tips for Examples for advice on good examples.

  • # Reference Documentation will probably make up the bulk of your docs, and is discussed in more detail below.

Reference Documentation

References are not usually read cover to cover, but are used to locate small pieces of specific information. Consequently they have to be structured around discoverability and completeness.

  • Make the reference searchable, ideally with a custom code-aware engine that recognizes keywords and function names. At the very least, provide an index or a customized 3rd party search engine. Note that creating indices becomes much easier with special-purpose documentation systems or wikis than with general-purpose text processors.

  • A tabular format for reference entries makes it easy to locate the required info at a glance. Interested in the return value of a function? That should be mentioned in the “returns” section, not buried in the free-form description. See # Function Reference Sections for a couple of recommended sections.

  • Each entry should be self-contained, and not require other entries to have been read in order to be comprehensible. If some information is necessary to understand multiple entries, repeat that info in each entry. If some other section contains relevant info, link to it and provide a summary.

Function Reference Sections

For reference entries of functions and methods, I use the following sections. These sections are suggestions, so leave out sections that do not apply in your case.

  • # Specifying the Type Signature is obvious in statically typed languages, but even more important in dynamic ones. In dynamic languages, I do not explicitly mention the type of each argument in the signature, but write down a symbolic invocation in a self-documenting manner. E.g. (key, value) = api.nextEntry(cursor). In my experience, it is useful to name the return value. Since this symbolic invocation is not example code but only illustrates the possible modes of invocation, it can help to use an EBNF-like notation as is customary in man pages, e.g. rm [<option>...] <file>....

  • # Summarize the behavior in a short command sentence that can be used in place of the function invocation. I prefer this to be a simple command rather than a descriptive sentence, i.e. “get the email subject ” rather than “gets the email subject ”. Ideally, this description fits comfortably into a single line. If it is longer or if the short command contains and/or/but, this could be an indication you should refactor into two separate functions.

  • # List all Parameters with their name and type. If an argument is optional, document the default value. If there are restrictions on the range of accepted values or interdependencies between arguments, mention them here. E.g. explicitly state when a string must be non-empty, or when an integer must be non-negative. Describe the purpose of this parameter in the context of the function.

  • # For the Return Value, describe the type and range of the result and its meaning. If you return a complex structure such as a tuple, explain each member of that structure

  • # List all Exceptions that could be thrown, and the reasons why they might occur. Bonus points for explaining how the error could be avoided, or handled and recovered from. If a function provides a strong guarantee that it never throws, documenting that as well.

  • # If necessary, add an In-Depth Description of the function's behaviour. How the function behaves has mostly been explained in the other sections, so focus on side effects and finer details of the function. Take extra care to specify all edge cases.

Depending on the kind of function, you might want to add a couple of other sections as well:

  • # Provide a Tutorial for more complicated or more frequently used functions. This doesn't have to cover all edge case; leave that to the detailed description.

  • # Examples are a great way to learn for many people. Please read # Tips for Examples.

  • # Security Concerns deserve their own section if any are present. It would be irresponsible to hide this crucial information in the description. A reader should be able to stop once they have found their required information, so make security concerns stand out.

  • # The full Contract of a method should be stated explicitly when you expect it to be implemented or overridden in another class. This is particularly important when documenting interface members.

  • # Use equivalent Pseudocode to illustrate the semantics of simple functions. The pseudocode should be exactly equivalent in all edge cases, but some details such as input validation can be elided.

  • # Specify the expected Interface Stability when some functionality is either experimental, deprecated, or very unlikely to change. This way, users can easily decide whether they want to depend on that feature. By default, I would expect your API to have a clear version number that uses SemVer.

Here is an example of function reference documentation using a trivial addition function. The purpose of the example is explaining how the various sections could be used; please do not actually overdo your docs like this. First we’ll look at a free-form description in the style of manpages, then we’ll structure the documentation using the above sections.

add

sum = add(a, b)

Adds a and b. The parameters a and b must be None or convertible to int, or this function throws a ValueError.

The good thing about this style is that it can be very concise for short functions. The bad part is that it tends to be imprecise (what happens when a is None?), and that you’ll have to read the whole description to find answers to frequent questions (when does it throw?). Using clear sections uses much more space, but you can find your answers at a glance:

add

sum = add(a, b)

add two integers.

a: convertible to int or None
b: convertible to int or None
the addends. If they are None, then they default to 0. Other values will be converted to int via the builtin int().

Returns sum: int – the sum of a and b.

Throws ValueError when the parameters cannot be converted to integers.

Pseudocode

def add(a, b):
  a_converted = int(a) if a is not None else 0
  b_converted = int(b) if b is not None else 0
  return a_converted + b_converted

Examples

add(3, 5)  #=> 8
add(2.7, 5.4)  #=> 7, because int() rounds downwards
add(None, 13)  #=> 13, because None is used as zero
add(“foo”, 7)  #=> throws ValueError

Note that the function is so well-specified that it doesn’t need a description section.

Class Reference Sections

Class references are not fundamentally different from function references. A few unique sections might be these:

  • # Summarize the Single Responsibility of the class. E.g. “RemoteWorker: run RemoteJobs over the network” or “OutputItem: model URL hierarchy of the website”.

  • # Describe Instantiation of objects. This is essentially # Function Reference documentation for the constructor. Some classes have special rules around instantiation, which should be described here. Examples are Singleton objects or classes using the Builder Pattern.

  • # Explain the Class Invariant if your object is mutable: What properties are guaranteed to hold? Do I have to check something before I can invoke a method? E.g. some objects have a normal state and an empty state. In such a case, explain which methods change the state. Specifying an invariant is very important if the class can be extended, so that subclasses don’t break the base class.

  • # Provide Member Reference Documentation. Every publicly visible type, symbol, method, and variable should have a reference entry.

In addition, you can write tutorials, free-form descriptions for details, security sections, or a couple of examples, as suggested for function documentation.

Tips for Examples

Examples will be imitated, and some people learn best by pattern matching example code against their requirements. Your examples should take this into account.

  • Make sure your example actually works. Ideally, it would be part of your test suite.

  • Promote best practices. Your example will be copied, so it should not contain any security vulnerabilities or dirty tricks. Spell out all the error handling you would do in production code, though the kind of error handling may be trivial (e.g. throw an application error).

  • Examples can leave out boilerplate. A bit of common scaffolding can be assumed to be present, provided that it was discussed in the # introductory documentation or was explicitly mentioned in the example description.

  • The topic of your example can be trivial or silly. Real-world examples can be powerful, but might confuse a reader with domain-specific details. Many projects use a kind of running gag for examples, e.g. Python documentation frequently references the “Spam” sketch by Monty Python.

  • Examples should be reasonably short. If they grow too long, try a simpler example. If that doesn’t help, your API might be too complex and you might want to simplify it.

General Advice

# Make your docs discoverable. What web pages serve as the entry points to your project? Those shouldn't just be a download button, but should put the introductory documentation first and foremost.

# Inside your documentation, link between different kinds of documentation that cover the same subject. When a tutorial uses a function, link to that function's reference documentation. A class is discussed more closely in some how-to blog post? Link to that article from the reference documentation.

# Writing good docs involves technical writing, which is an entirely different skillset from programming. A bit of advice:

  • # Get to the point. Rather than jabbering on with endless streams of repetitive adjectives, delete those pesky filler words. Unlike creative writing, technical writing does not require you to capture the imagination of your audience with colourful descriptions.

  • # Use simple language. Prefer short sentences. Many readers will not be native speakers, so make the docs as accessible as possible.

  • # Set expectations early. Because the time of your readers is valuable, state up front what topic each document covers, and what kind of documentation it provides. Link to other kinds and related topics where available.

  • # Put important stuff first, defer details until later. In journalism, this technique is known as the Inverted Pyramid. A reader should be able to stop reading at any point without having missed something too important. Mention security considerations in their own section or info box so that they are hard to miss. Start paragraphs with simple sentences that serve as a kind of heading. Use actual headings and bullet points to structure your text. Setting keywords in bold font can make a document easier to scan.

  • # Try to use a light hearted, slightly informal tone that's enjoyable to read. Be careful not to go too far: a conversational tone and too frequent jokes get in the way of communicating relevant information, and annoys readers.