Tox is a neat tool for helping test Python projects. It automatically creates “virtual environments” that include the necessary dependencies, and can then run user-defined tools for testing. Multiple environments can be created in a declarative manner to test different combinations of Python versions + dependencies. Tox will also build packages in an isolated environment.

This has been an absolutely fantastic tool in its 15 years of existence. But since then, tooling in the Python ecosystem has moved on, and doing it the Tox way will probably slow you down.

You can get 90% of the value of Tox by wrapping Poetry or uv, and will end up with simpler, faster, and more flexible QA tooling. My preferred way to do that is to define tasks with Just, which enables something quite close to the npm run style development experience.

Tox creates one environment per task

Tox (and to a certain degree, its competitor Nox), confuse two somewhat unrelated responsibilities:

  • managing virtual environments
  • running tasks

For example, the Tox user guide shows an example that creates 5 different environments (slightly edited for conciseness):

requires = ["tox>=4"]
env_list = ["lint", "type", "3.13", "3.12", "3.11"]

[env_run_base]
description = "run unit tests"
deps = ["pytest>=8"]
commands = [["pytest", { replace = "posargs", default = ["tests"], extend = true }]]

[env.lint]
description = "run linters"
skip_install = true
deps = ["black"]
commands = [["black", { replace = "posargs", default = ["."], extend = true} ]]

[env.type]
description = "run type checks"
deps = ["mypy"]
commands = [["mypy", { replace = "posargs", default = ["src", "tests"], extend = true} ]]

It is then possible to run all environments via tox, or select environments like tox -e type.

This, again, is very good, very thorough.

But there are some problems:

  • Creating a task like lint requires defining a new environment, which will be a new virtualenv with its own copy of dependencies. There is no sharing between environments, beyond the usual pip caching. For projects with a large dependency graph (e.g. if you're working on machine learning), this can accumulate a lot of unnecesary overhead.

  • Dependency locking requires manual effort. For example, the above example will install whatever the newest black release is, yet that code formatter has a breaking release once per year. Tox doesn't integrate with Python dependency lockfiles, other than the constraint.txt files supported by pip. Tox doesn't integrate with pyproject.toml files, beyond the ability to inline the tox.ini/tox.toml into the pyproject file.

  • The Tox environments can diverge from your main venv. If you're using any modern IDE or editor, you probably want autocomplete and auto-formatting via that editor. So you want a venv with the correct versions of dependencies and tools for local use. But since the contents of your main venv are defined completely independently from the Tox venvs, there can be some divergence.

    Arguably, this is a strong point of Tox, because Tox creates isolated venvs that protect you from unintended changes. But it also makes intended changes more difficult, because you now may have to update dependencies in multiple places (unless using tricks like creating a [test] extra in your package metadata, and then telling Tox to install .[test]. But that's a bad idea for projects that are destined for PyPI, as this pollutes the metadata for downstream projects with irrelevant dependencies/constraints.)

    Update: In a comment on Mastodon, Timothée Mazzucotelli <@[email protected]> pointed out that Tox recently (Oct 2024) got support for installing venv contents from standard pyproject.toml PEP 735 dependency groups (see Tox docs for dependency_group). That is a Tox-native way to solve the duplicate-effort problem, without the issues related to “extras”.

Enter Poetry and uv

All of this Tox-style venv management seems quite odd after experiencing Poetry or uv. These two tools are Python project managers, but also include venv and dependency management.

Poetry is the older one of these tools and predates some of the recent standardization efforts in the Python community, whereas uv is younger, faster, more standards-aligned, but also less mature overall. Poetry also has its own build system (instead of reusing setuptools or hatch), whereas uv also leans into a Python version manager role like pyenv and strives to be a complete pip/pipx replacement. Both are excellent choices for managing a Python project.

The point here is that both Poetry and uv already create venvs, and getting them to work seamlessly with Tox is a lot of effort. Tox can use uv pip as a drop-in replacement for pip, but that sidesteps uv's most powerful venv management features. There is the tox-poetry-installer plugin to get Tox to install dependencies via Poetry, but its Tox 4 support is still in beta, and in my experience its agonizingly slow.

Tox is not a good task runner

If we already use modern Python tooling like Poetry and uv to manage our venvs, including our development dependencies, then Tox is reduced to a plain task runner.

But Tox is not a particularly good task runner.

Defining new tasks is fairly cumbersome, as you must spell out the relevant dependencies. It is also annoying to get Tox to stop creating its own venvs.

At that point, anything is likely to be easier than using Tox, including a Makefile or a plain shell script, or just running the development tools directly:

  • can invoke individual tools like poetry run pytest or uv run pytest
  • can activate the venv in the local shell like poetry shell or . .venv/bin/activate (for uv), and then run the installed tools like pytest directly

Just use just

The Just task runner has been my favourite replacement. It looks like a classic Makefile, but is just a task runner:

  • shell-oriented
  • supports dependencies between recipes
  • but is not a build system like make - doesn't check whether files are up to date

So Just is a neat, more modern, less error-prone Makefile alternative.

At first approximation, we could write a Poetry-based justfile for the above Tox configuration like this:

qa: install lint type test

test *args: install
    poetry run pytest {{args}}

lint *args: install
    poetry run black {{args}}

type: *args: install
    poetry run mypy {{args}}

install:
    poetry install --sync

This is shorter and clearer than the Tox configuration, and tends to be much faster. (I have literally seen 30% faster CI builds from this change.)

Of course, this will only test with a single Python version, not with the 3.13/3.12/3.11 range in the Tox configuration. But that's usually an acceptable constraint for local development. You can still test under a range of Python versions in CI, e.g. a matrix job in GitHub Actions.

The above Poetry-based justfile has a couple of minor limitations. In reality, I'd write it like this:

set shell := ["poetry", "run", "bash", "-euxo", "pipefail", "-c"]
set positional-arguments

qa *args: install lint type (test args)

test *args: install
    pytest "$@"

lint *args: install
    black "$@"

type: *args: install
    mypy "$@"

install:
    #!/bin/sh
    poetry install --sync

This can be invoked as just to run the default qa target, or just test -k 'feature or integration' to run pytest with a test selector expression.

Key differences:

  • Define a custom shell that runs each line of the recipe inside the Poetry venv. That way, I no longer have to prefix each command with poetry run, and can also use shell control flow like || as expected.
  • Enable the positional-arguments feature, so that I can pass the arguments as shell parameters, which prevents issues around word-splitting. In particular, this allows me to correctly forward arguments that contain whitespace.
  • In the install recipe, I set a custom shebang to override the shell setting. This is not strictly necessary for poetry install, but it seemed like a good way to demonstrate this feature.

We can do the same for uv. The switch from Poetry to uv only requires two changes in the above justfile

  • set shell := ['uv', 'run', 'bash', '-euxo', 'pipefail', '-c']
  • remove the install recipe, because uv run automatically syncs the venvs

Because uv can manage multiple Python versions, it would also be possible to create targets to test under a particular Python version. Roughly, this would require a snippet like:

py311 *args:
    #!/bin/sh
    uv run --isolated --python=3.11 pytest "$@"
  • --python=3.11 explicitly request a particular Python version. Uv will automatically download it if necessary.
  • --isolated to avoid affecting the main venv. While this is a Tox-style separate venv, uv's caching makes this super fast, and dependencies are still managed via the usual project metadata.

Conclusion

If you're already using Poetry or uv, then Tox is not a good fit for your workflows. Unless you really need to generate a matrix of test environments, a dedicated task runner like Just will be faster and more convenient.

I've been using variations of this strategy in all my projects for half a year now and never looked back. For example, here's the justfile I'm using in my yaxmldiff project.