While cleaning up some of my dotfiles, I found what may be the most cursed Python code I have ever written: raw syscalls that required parsing Linux header files.
Goal
My goal here was simple: backups are background tasks. Background tasks should drop the process priority as far as possible so that they do not interfere when I'm actually using the system.
Dropping CPU priority is quite simple in Python:
just call os.nice(19)
from the standard library.
In contrast, I/O priority is far, far more complicated – but quite important. Backups tend to be more I/O-heavy, so I definitively do not want backups to starve other programs from disk access, even on SSDs.
But there is no Python function
This should be easy, I thought: just call os.ionice(...)
and be done with it.
But no such function exists in Python's standard library.
But there is no glibc function
No problem, this is Python, I can just call any libc function directly
using the ctypes
module – a bit hacky, but it should work.
Except that there is no corresponding function in glibc either.
As the ioprio_set(2)
manpage says:
Note: glibc provides no wrappers for these system calls, necessitating the use of syscall(2).
Making a syscall in C
In C, a syscall works as follows:
-
Import the necessary headers:
#include <sys/syscall.h> #include <unistd.h>
These include the
syscall(long number, ...)
function prototype, as well as theSYS_*
and__NR_*
symbolic constants that map syscall names to their numbers. (Linux ensures ABI stability so it won't change syscall numbers, but the ABI is architecture-dependent so portable code should not hardcode the numbers.) -
Invoke the syscall, by giving it the syscall number and any extra arguments as required by that specific syscall. Here, that might look like:
int which = ...; int who = ...; int ioprio = ...; if (!syscall(__NR_ioprio_set, which, who, ioprio)) { perror("ioprio_set"); abort(); }
Making a syscall in Python
We can translate that easily enough to Python, right?
-
Obtain a reference to the
syscall()
function via ctypes. For example:import ctypes libc = ctypes.CDLL("libc.so.6") # can now access libc.syscall()
-
Call it with the necessary arguments:
sys_ioprio_set = ... which = ... who = ... ioprio = ... result = libc.syscall(sys_ioprio_set, which, who, ioprio) assert result == 0, "ioprio_set() failed"
But this requires that we figure out the correct syscall number first.
There is no mapping that could be queried in the glibc,
the information is only in the Linux header files.
(Technically, the header files are generated from the syscall_64.tbl
file,
but that would only help us if we have we Kernel source code for our running Linux Kernel.)
When we look up syscalls in Python on Stack Overflow, the suggestion just seems to be to hardcode the syscall number. For example:
__NR_getdents = 78 # YMMV
...
syscall(39) # 39 = getpid, but you get the gist
This is not very promising.
Finding the correct header files
Let's do this the right way by looking up the correct number in the Linux header files, as if we were compiling a C program.
In C, we would #include <sys/syscall.h>
.
On my system, that maps to the path
/usr/include/x86_64-linux-gnu/sys/syscall.h
.
Which in turn includes anther file:
/* This file should list the numbers of the system calls the system knows.
But instead of duplicating this we use the information available
from the kernel sources. */
#include <asm/unistd.h>
Which in turn includes other files based on the current architecture:
# ifdef __i386__
# include <asm/unistd_32.h>
# elif defined(__ILP32__)
# include <asm/unistd_x32.h>
# else
# include <asm/unistd_64.h>
# endif
Finally, in the 64-bit file, we find the syscall number definition:
#define __NR_ioprio_set 251
#define __NR_ioprio_get 252
Parsing the header files
Honestly, the smart and correct thing at this point would be to obtain the required syscall number by running through the C preprocessor. E.g. something like this script:
echo 'result_ioprio_set = __NR_ioprio_set;' \
| cpp -P -include sys/syscall.h \
| grep 'result_ioprio_set ='
Output: result_ioprio_set = 251;
The cpp
program is the C preprocessor,
in my case implemented by GCC.
Normally, the output contains directives that track the source location,
making the output difficult to parse.
The -P
option is used to suppress that unnecessary information.
The -include
option lets us list additional headers, which is perfect here.
But we don't have to shell out to the C preprocessor if we assume that we're only running on AMD64 Linux and that the header file is formatted in a consistent manner.
Here's the Python code that I used at the time:
def get_syscalls() -> dict[str, int]:
syscall_numbers = dict()
# TODO use more robust method to discover correct include file
with open("/usr/include/x86_64-linux-gnu/asm/unistd_64.h") as f:
for line in f:
if not line.startswith("#define __NR_"):
continue
(_, name, value) = line.split()
name = name[len("__NR_") :] # strip prefix
syscall_numbers[name] = int(value)
return syscall_numbers
Note that name[len("__NR_") :]
can be written more robustly as name.removeprefix("__NR_")
on Python 3.9 or later.
(Turns out that the glibc uses a combination of these two approaches:
it first expands the contents of the asm/unistd.h
Linux header using the C preprocessor,
the uses a strikingly similar Python script
to parse the list of syscalls.)
With this, we can invoke the correct Linux syscall – but have yet to figure out which arguments to pass to that syscall.
Understanding ioprio_set()
arguments
Returning to the manpage for ioprio_set(which, who, ioprio)
,
we can see that it talks about various macros and constants.
If we want to set the I/O priority for a process, then:
The
which
andwho
arguments identify the thread(s) on which the system calls operate. Thewhich
argument determines howwho
is interpreted, and has one of the following values:
IOPRIO_WHO_PROCESS
who
is a process ID or thread ID identifying a single process or thread. Ifwho
is 0, then operate on the calling thread.
And:
The
ioprio
argument given toioprio_set()
is a bit mask that specifies both the scheduling class and the priority to be assigned to the target process(es). The following macros are used for assembling and dissectingioprio
values:
IOPRIO_PRIO_VALUE(class, data)
Given a scheduling
class
and priority (data
), this macro combines the two values to produce anioprio
value, which is returned as the result of the macro.
This sounds good, except that these macros do not exist.
The end of the manpage sheepishly admits:
Bugs
glibc does not yet provide a suitable header file defining the function prototypes and macros described on this page. Suitable definitions can be found in
linux/ioprio.h
.
That header includes the necessary info, now we only have to translate it to Python:
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* ... */
#define IOPRIO_CLASS_SHIFT 13
#define IOPRIO_CLASS_MASK 0x07
#define IOPRIO_PRIO_MASK ((1UL << IOPRIO_CLASS_SHIFT) - 1)
/* ... */
#define IOPRIO_PRIO_VALUE(class, data) \
((((class) & IOPRIO_CLASS_MASK) << IOPRIO_CLASS_SHIFT) | \
((data) & IOPRIO_PRIO_MASK))
/*
* ... IDLE is the idle scheduling class, it is only
* served when no one else is using the disk.
*/
enum {
IOPRIO_CLASS_NONE,
IOPRIO_CLASS_RT,
IOPRIO_CLASS_BE,
IOPRIO_CLASS_IDLE,
};
/* ... */
enum {
IOPRIO_WHO_PROCESS = 1,
IOPRIO_WHO_PGRP,
IOPRIO_WHO_USER,
};
That means:
which = IOPRIO_WHO_PROCESS = 1
to select the "process" mode for thewho
argumentwho = 0
to modify I/O priority of the current processioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0) = 3 << 13
to select "idle" priority, which is the lowest priority level.
Incidentally, the which
and who
values
behave the same as for the setpriority()
syscall
which controls CPU priority ("nice" value) –
but that relationship is not documented anywhere.
Assembling the Python code
In the end, my Python function to drop CPU + IO priority looks like this:
import ctypes
def drop_priority():
"""Drop Linux scheduling priority (CPU + IO) as far as possible."""
syscall = get_syscalls()
libc = ctypes.CDLL("libc.so.6")
which = 1 # select a process, see linux/ioprio.h
who = 0 # current process
# CPU scheduling priority:
# high value -> very low priority
prio = 19
# IO scheduling priority: ioclass=IDLE, see linux/ioprio.h
ioprio = 3 << 13
result = libc.setpriority(which, who, prio)
assert result == 0, "setpriority() failed"
result = libc.syscall(syscall["ioprio_set"], which, who, ioprio)
assert result == 0, "ioprio_set() failed"
There's a package for that
Some time later, it occurred to me that I could not have been the first person to attempt this - but it can be difficult to find a package if you don't already know what exactly you're looking for.
In this case, the solution was the psutil
Python package
(psutil on GitHub,
psutil on PyPI,
ionice()
documentation).
With it, the code simplifies quite a bit:
import psutil
def drop_priority() -> None:
"""Drop Linux scheduling priority (CPU + IO) as far as possible."""
p = psutil.Process()
# CPU scheduling priority:
# high value -> very low priority
nice = p.nice(19)
# IO scheduling priority:
# Set ioclass=IDLE even though that might starve the process.
ionice = p.ionice(ioclass=psutil.IOPRIO_CLASS_IDLE)
But how does it work?
The answer is arguably a bit boring:
the psutil
Python package is implemented as a C extension,
so it has access to the headers with the syscall numbers.
But it does re-implement the IOPRIO macros,
avoiding the need to have the Linux headers installed locally.
Excerpt from the code in psutil/arch/linux/proc.c
/*
* Copyright (c) 2009, Giampaolo Rodola'. All rights reserved.
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
/* ... */
#include <sys/syscall.h>
/* ... */
static inline int
ioprio_set(int which, int who, int ioprio) {
return syscall(__NR_ioprio_set, which, who, ioprio);
}
#define IOPRIO_CLASS_SHIFT 13
#define IOPRIO_PRIO_MASK ((1UL << IOPRIO_CLASS_SHIFT) - 1)
/* ... */
#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
While it is fun that we could solve this ourselves with Python-only code, it's definitively more convenient if someone else has gone through the effort of packaging this up in a convenient Python API.
- next post: Allocator Testing
- previous post: Intent, not implementation