Tox is a neat tool for helping test Python projects. It automatically creates “virtual environments” that include the necessary dependencies, and can then run user-defined tools for testing. Multiple environments can be created in a declarative manner to test different combinations of Python versions + dependencies. Tox will also build packages in an isolated environment.
This has been an absolutely fantastic tool in its 15 years of existence. But since then, tooling in the Python ecosystem has moved on, and doing it the Tox way will probably slow you down.
You can get 90% of the value of Tox by wrapping Poetry or uv,
and will end up with simpler, faster, and more flexible QA tooling.
My preferred way to do that is to define tasks with Just,
which enables something quite close to the npm run
style development experience.
Tox creates one environment per task
Tox (and to a certain degree, its competitor Nox), confuse two somewhat unrelated responsibilities:
- managing virtual environments
- running tasks
For example, the Tox user guide shows an example that creates 5 different environments (slightly edited for conciseness):
requires = ["tox>=4"]
env_list = ["lint", "type", "3.13", "3.12", "3.11"]
[env_run_base]
description = "run unit tests"
deps = ["pytest>=8"]
commands = [["pytest", { replace = "posargs", default = ["tests"], extend = true }]]
[env.lint]
description = "run linters"
skip_install = true
deps = ["black"]
commands = [["black", { replace = "posargs", default = ["."], extend = true} ]]
[env.type]
description = "run type checks"
deps = ["mypy"]
commands = [["mypy", { replace = "posargs", default = ["src", "tests"], extend = true} ]]
It is then possible to run all environments via tox
, or select environments like tox -e type
.
This, again, is very good, very thorough.
But there are some problems:
-
Creating a task like
lint
requires defining a new environment, which will be a new virtualenv with its own copy of dependencies. There is no sharing between environments, beyond the usualpip
caching. For projects with a large dependency graph (e.g. if you're working on machine learning), this can accumulate a lot of unnecesary overhead. -
Dependency locking requires manual effort. For example, the above example will install whatever the newest
black
release is, yet that code formatter has a breaking release once per year. Tox doesn't integrate with Python dependency lockfiles, other than theconstraint.txt
files supported bypip
. Tox doesn't integrate withpyproject.toml
files, beyond the ability to inline thetox.ini
/tox.toml
into the pyproject file. -
The Tox environments can diverge from your main venv. If you're using any modern IDE or editor, you probably want autocomplete and auto-formatting via that editor. So you want a venv with the correct versions of dependencies and tools for local use. But since the contents of your main venv are defined completely independently from the Tox venvs, there can be some divergence.
Arguably, this is a strong point of Tox, because Tox creates isolated venvs that protect you from unintended changes. But it also makes intended changes more difficult, because you now may have to update dependencies in multiple places (unless using tricks like creating a
[test]
extra in your package metadata, and then telling Tox to install.[test]
. But that's a bad idea for projects that are destined for PyPI, as this pollutes the metadata for downstream projects with irrelevant dependencies/constraints.)Update: In a comment on Mastodon, Timothée Mazzucotelli <@[email protected]> pointed out that Tox recently (Oct 2024) got support for installing venv contents from standard
pyproject.toml
PEP 735 dependency groups (see Tox docs fordependency_group
). That is a Tox-native way to solve the duplicate-effort problem, without the issues related to “extras”.
Enter Poetry and uv
All of this Tox-style venv management seems quite odd after experiencing Poetry or uv. These two tools are Python project managers, but also include venv and dependency management.
Poetry is the older one of these tools and predates some of the recent standardization efforts in the Python community,
whereas uv is younger, faster, more standards-aligned, but also less mature overall.
Poetry also has its own build system (instead of reusing setuptools or hatch), whereas uv also leans into a Python version manager role like pyenv
and strives to be a complete pip
/pipx
replacement.
Both are excellent choices for managing a Python project.
The point here is that both Poetry and uv already create venvs, and getting them to work seamlessly with Tox is a lot of effort.
Tox can use uv pip
as a drop-in replacement for pip
, but that sidesteps uv
's most powerful venv management features.
There is the tox-poetry-installer plugin to get Tox to install dependencies via Poetry, but its Tox 4 support is still in beta, and in my experience its agonizingly slow.
Tox is not a good task runner
If we already use modern Python tooling like Poetry and uv to manage our venvs, including our development dependencies, then Tox is reduced to a plain task runner.
But Tox is not a particularly good task runner.
Defining new tasks is fairly cumbersome, as you must spell out the relevant dependencies. It is also annoying to get Tox to stop creating its own venvs.
At that point, anything is likely to be easier than using Tox, including a Makefile or a plain shell script, or just running the development tools directly:
- can invoke individual tools like
poetry run pytest
oruv run pytest
- can activate the venv in the local shell like
poetry shell
or. .venv/bin/activate
(for uv), and then run the installed tools likepytest
directly
Just use just
The Just task runner has been my favourite replacement. It looks like a classic Makefile, but is just a task runner:
- shell-oriented
- supports dependencies between recipes
- but is not a build system like
make
- doesn't check whether files are up to date
So Just is a neat, more modern, less error-prone Makefile alternative.
At first approximation, we could write a Poetry-based justfile
for the above Tox configuration like this:
qa: install lint type test
test *args: install
poetry run pytest {{args}}
lint *args: install
poetry run black {{args}}
type: *args: install
poetry run mypy {{args}}
install:
poetry install --sync
This is shorter and clearer than the Tox configuration, and tends to be much faster. (I have literally seen 30% faster CI builds from this change.)
Of course, this will only test with a single Python version, not with the 3.13/3.12/3.11
range in the Tox configuration.
But that's usually an acceptable constraint for local development.
You can still test under a range of Python versions in CI,
e.g. a matrix
job in GitHub Actions.
The above Poetry-based justfile has a couple of minor limitations. In reality, I'd write it like this:
set shell := ["poetry", "run", "bash", "-euxo", "pipefail", "-c"]
set positional-arguments
qa *args: install lint type (test args)
test *args: install
pytest "$@"
lint *args: install
black "$@"
type: *args: install
mypy "$@"
install:
#!/bin/sh
poetry install --sync
This can be invoked as just
to run the default qa
target,
or just test -k 'feature or integration'
to run pytest
with a test selector expression.
Key differences:
- Define a custom
shell
that runs each line of the recipe inside the Poetry venv. That way, I no longer have to prefix each command withpoetry run
, and can also use shell control flow like||
as expected. - Enable the
positional-arguments
feature, so that I can pass the arguments as shell parameters, which prevents issues around word-splitting. In particular, this allows me to correctly forward arguments that contain whitespace. - In the
install
recipe, I set a custom shebang to override theshell
setting. This is not strictly necessary forpoetry install
, but it seemed like a good way to demonstrate this feature.
We can do the same for uv. The switch from Poetry to uv only requires two changes in the above justfile
set shell := ['uv', 'run', 'bash', '-euxo', 'pipefail', '-c']
- remove the
install
recipe, becauseuv run
automatically syncs the venvs
Because uv can manage multiple Python versions, it would also be possible to create targets to test under a particular Python version. Roughly, this would require a snippet like:
py311 *args:
#!/bin/sh
uv run --isolated --python=3.11 pytest "$@"
--python=3.11
explicitly request a particular Python version. Uv will automatically download it if necessary.--isolated
to avoid affecting the main venv. While this is a Tox-style separate venv, uv's caching makes this super fast, and dependencies are still managed via the usual project metadata.
Conclusion
If you're already using Poetry or uv, then Tox is not a good fit for your workflows. Unless you really need to generate a matrix of test environments, a dedicated task runner like Just will be faster and more convenient.
I've been using variations of this strategy in all my projects for half a year now
and never looked back.
For example, here's the justfile I'm using in my yaxmldiff
project.
- no newer posts. Subscribe to Atom feed. Get notified when more articles are posted.
- previous post: Is Python Code Sensitive to CPU Caching?