» Dump

The Python standard library is weird

I'm digging through old notes from back when I started using Python seriously, probably around 2017. While there's a lot to like about the language, its standard library has problems.

I am not a fan of the “battery included” approach. The standard library is where modules go to die, and some parts do smell decidedly rotten. To be fair, there's some really cool and/or necessary stuff like functools, os, sys etc., but then there's also the rest.

There are lots of small problems. Like many packages not being PEP-8 compliant, or exposing unintuitive, error-prone interfaces. But that's merely annoying, not truly bad.

The problems of Python's standard library are the nearly unusable documentation, and the proliferation of subtle bugs and restrictions that render a module effectively unusable.

Unhelpful reference documentation

The documentation is comprehensive, but not very thorough. The docs are generally in the style of an overview or an tutorial, but not in the style of a reference. It is extremely rare to see a function that explicitly documents all its parameters. Such documentation makes it easy to get started, but difficult to get done.

The result is fragile code, because I have to depend on undocumented implementation details rather than on a documented contract. I tend to write code by trial an error, and use reflection with help() or dir() to find out what methods are supported and what properties are available.

Added 2023: Type-based autocomplete with an LSP server such as pyright does help a lot. The servers use fairly exhaustive type annotations for most of the standard library. However, the docs themselves tend to lack this detailed type information, making it necessary to take a wild guess.

Other languages tend to fare better:

  • Believe it or not, but PHP's documentation is comparable or even slightly better as a reference (but has enough other issues, such as the lack of versioning).

  • Perl and Linux manpages use a similar style but tend to be more detailed.

  • C++ has a formal standard, but it isn't widely available. Instead, websites like https://en.cppreference.com/ provide extremely detailed documentation on built-in functions and language constructs.

  • But the real stars are the Java and .NET ecosystems. Their reference documentation is excellent and should be an example to everyone. If they have an issue, it is that tutorial-style documentation is usually separate from the reference and is unversioned, It is also difficult to get an overview – Python does a much better job of guiding you through the documentation.

  • Added 2023: and of course the Rust API docs are very good.

Of course it's also possible to do significantly worse, e.g. Haskell's reference docs are seldom more than “this is the type signature”.

Quality of modules

Now on to the quality of the standard library. I understand that open-source projects are dependent on volunteer work. If something went through the trouble of implementing some library, why not use that? Well, the bad news is that Python really feels like that: Something was included not because it went through a design, review, and standardization process, but because it was conveniently available. I suppose this has gotten better with the PEP process, but there's still a lot of baggage around that can't be reworked for backwards compatibility reasons.

After debugging some issues involving multithreading, I'm surprised that anything works.

Selection of modules

(added 2023)

The "batteries included" approach has led to the inclusion of lots of modules that are of questionable value for typical Python developers. But now that they're there, they can't really be removed.

For example, consider test frameworks/utilities in the standard library:

  • doctests is fantastic, though it shows its age. For example, it cannot be used for doc-testing async code.
  • unittest is a conventional xUnit style test framework. It is OK, but obsolete because …

The community has migrated to pytest now. But that's not part of the standard library. New devs might mistake the standard library test frameworks for recommendations, when they are actually just historical baggage.

A similar community vs stdlib split exists with regards to HTTP clients, where the stdlib contains the somewhat cumbersome urllib library, whereas the community mostly uses urrlib3 or requests.

The Python standard library includes a GUI framework. I'm torn on this – it is legitimately useful. But it is also extremely ugly, and absolutely irrelevant for server use cases, where a graphics stack should not have to exist.

There are various OS-specific utilities. Mostly, there are abstractions for syscalls in the os module. This is good.

But there are also some rather specialized things like grp for the Unix group database, which is a very very thin wrapper around the corresponding POSIX APIs in C. I'm not quite sure whether that belongs into the standard library, and if so, why it's separate from the functions in the os module.

There are four (4) different XML parsers in the standard library. Yet none of them should be used for untrusted input. Instead, the third-party lxml module (based on the libxml2 C library) is probably a good starting point.

Let's talk about cryptography. I think this is something that should either be deliberately excluded from the stdlib (like Rust does), or that should provide a good interface with modern algorithm choices. Since Python wants to include things like HTTP(S) clients, it must also ship with various cryptographic routines.

There are three cryptography-relevant modules: secrets, hashlib, and hmac.

The secrets module generates strong random numbers (think urandom()). This module is modern and good.

hmac is a wrapper around hashlib.

The hashlib module does contain relevant algorithms, though it's docs are fairly confusing:

  • ancient stuff like MD5
  • SHA 1/2/3 algorithms
  • SHAKE extensible hashes
  • BLAKE2
  • possibly others depending on OpenSSL
  • key derivation with PBKDF2 and Scrypt, but only with low-level interfaces that cannot be used by amateurs

Notably missing:

  • other cryptography primitives, like AES encryption
  • other modern hashes like BLAKE3, Poly1305
  • state of the art KDFs like Argon2
  • creating and validating signatures
  • high-level interfaces that can be used by amateurs

There is an ssl module that has to perform some of the above things in order to support HTTPS connections, but quite reasonably ssl only exposes high-level protocol wrappers and not low-level OpenSSL routines.

Instead of the standard library, Python devs will probably use the third-party cryptography module. It contains a curated list of relevant algorithms, provided by OpenSSL. The library is very well designed. However, the module controversially includes Rust components, which limits is cross-platform support.