Cursed Syscalls to Set IO Priority in Python

While cleaning up some of my dotfiles, I found what may be the most cursed Python code I have ever written: raw syscalls that required parsing Linux header files.

Goal

My goal here was simple: backups are background tasks. Background tasks should drop the process priority as far as possible so that they do not interfere when I'm actually using the system.

Dropping CPU priority is quite simple in Python: just call os.nice(19) from the standard library.

In contrast, I/O priority is far, far more complicated – but quite important. Backups tend to be more I/O-heavy, so I definitively do not want backups to starve other programs from disk access, even on SSDs.

But there is no Python function

This should be easy, I thought: just call os.ionice(...) and be done with it.

But no such function exists in Python's standard library.

But there is no glibc function

No problem, this is Python, I can just call any libc function directly using the ctypes module – a bit hacky, but it should work.

Except that there is no corresponding function in glibc either. As the ioprio_set(2) manpage says:

Note: glibc provides no wrappers for these system calls, necessitating the use of syscall(2).

Making a syscall in C

In C, a syscall works as follows:

  1. Import the necessary headers:

    #include <sys/syscall.h>
    #include <unistd.h>
    

    These include the syscall(long number, ...) function prototype, as well as the SYS_* and __NR_* symbolic constants that map syscall names to their numbers. (Linux ensures ABI stability so it won't change syscall numbers, but the ABI is architecture-dependent so portable code should not hardcode the numbers.)

  2. Invoke the syscall, by giving it the syscall number and any extra arguments as required by that specific syscall. Here, that might look like:

    int which = ...;
    int who = ...;
    int ioprio = ...;
    if (!syscall(__NR_ioprio_set, which, who, ioprio)) {
       perror("ioprio_set");
       abort();
    }
    

Making a syscall in Python

We can translate that easily enough to Python, right?

  1. Obtain a reference to the syscall() function via ctypes. For example:

    import ctypes
    
    libc = ctypes.CDLL("libc.so.6")
    # can now access libc.syscall()
    
  2. Call it with the necessary arguments:

    sys_ioprio_set = ...
    which = ...
    who = ...
    ioprio = ...
    result = libc.syscall(sys_ioprio_set, which, who, ioprio)
    assert result == 0, "ioprio_set() failed"
    

But this requires that we figure out the correct syscall number first. There is no mapping that could be queried in the glibc, the information is only in the Linux header files. (Technically, the header files are generated from the syscall_64.tbl file, but that would only help us if we have we Kernel source code for our running Linux Kernel.)

When we look up syscalls in Python on Stack Overflow, the suggestion just seems to be to hardcode the syscall number. For example:

__NR_getdents = 78  # YMMV
...
syscall(39)  # 39 = getpid, but you get the gist

This is not very promising.

Finding the correct header files

Let's do this the right way by looking up the correct number in the Linux header files, as if we were compiling a C program.

In C, we would #include <sys/syscall.h>. On my system, that maps to the path /usr/include/x86_64-linux-gnu/sys/syscall.h.

Which in turn includes anther file:

/* This file should list the numbers of the system calls the system knows.
   But instead of duplicating this we use the information available
   from the kernel sources.  */
#include <asm/unistd.h>

Which in turn includes other files based on the current architecture:

# ifdef __i386__
#  include <asm/unistd_32.h>
# elif defined(__ILP32__)
#  include <asm/unistd_x32.h>
# else
#  include <asm/unistd_64.h>
# endif

Finally, in the 64-bit file, we find the syscall number definition:

#define __NR_ioprio_set 251
#define __NR_ioprio_get 252

Parsing the header files

Honestly, the smart and correct thing at this point would be to obtain the required syscall number by running through the C preprocessor. E.g. something like this script:

echo 'result_ioprio_set = __NR_ioprio_set;' \
  | cpp -P -include sys/syscall.h \
  | grep 'result_ioprio_set ='

Output: result_ioprio_set = 251;

The cpp program is the C preprocessor, in my case implemented by GCC. Normally, the output contains directives that track the source location, making the output difficult to parse. The -P option is used to suppress that unnecessary information. The -include option lets us list additional headers, which is perfect here.

But we don't have to shell out to the C preprocessor if we assume that we're only running on AMD64 Linux and that the header file is formatted in a consistent manner.

Here's the Python code that I used at the time:

def get_syscalls() -> dict[str, int]:
    syscall_numbers = dict()

    # TODO use more robust method to discover correct include file
    with open("/usr/include/x86_64-linux-gnu/asm/unistd_64.h") as f:
        for line in f:
            if not line.startswith("#define __NR_"):
                continue
            (_, name, value) = line.split()
            name = name[len("__NR_") :]  # strip prefix
            syscall_numbers[name] = int(value)

    return syscall_numbers

Note that name[len("__NR_") :] can be written more robustly as name.removeprefix("__NR_") on Python 3.9 or later.

(Turns out that the glibc uses a combination of these two approaches: it first expands the contents of the asm/unistd.h Linux header using the C preprocessor, the uses a strikingly similar Python script to parse the list of syscalls.)

With this, we can invoke the correct Linux syscall – but have yet to figure out which arguments to pass to that syscall.

Understanding ioprio_set() arguments

Returning to the manpage for ioprio_set(which, who, ioprio) , we can see that it talks about various macros and constants.

If we want to set the I/O priority for a process, then:

The which and who arguments identify the thread(s) on which the system calls operate. The which argument determines how who is interpreted, and has one of the following values:

  • IOPRIO_WHO_PROCESS

    who is a process ID or thread ID identifying a single process or thread. If who is 0, then operate on the calling thread.

And:

The ioprio argument given to ioprio_set() is a bit mask that specifies both the scheduling class and the priority to be assigned to the target process(es). The following macros are used for assembling and dissecting ioprio values:

  • IOPRIO_PRIO_VALUE(class, data)

    Given a scheduling class and priority (data), this macro combines the two values to produce an ioprio value, which is returned as the result of the macro.

This sounds good, except that these macros do not exist.

The end of the manpage sheepishly admits:

Bugs

glibc does not yet provide a suitable header file defining the function prototypes and macros described on this page. Suitable definitions can be found in linux/ioprio.h.

That header includes the necessary info, now we only have to translate it to Python:

/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* ... */

#define IOPRIO_CLASS_SHIFT  13
#define IOPRIO_CLASS_MASK   0x07
#define IOPRIO_PRIO_MASK    ((1UL << IOPRIO_CLASS_SHIFT) - 1)
/* ... */

#define IOPRIO_PRIO_VALUE(class, data)  \
    ((((class) & IOPRIO_CLASS_MASK) << IOPRIO_CLASS_SHIFT) | \
     ((data) & IOPRIO_PRIO_MASK))

/*
 * ... IDLE is the idle scheduling class, it is only
 * served when no one else is using the disk.
 */
enum {
    IOPRIO_CLASS_NONE,
    IOPRIO_CLASS_RT,
    IOPRIO_CLASS_BE,
    IOPRIO_CLASS_IDLE,
};
/* ... */

enum {
    IOPRIO_WHO_PROCESS = 1,
    IOPRIO_WHO_PGRP,
    IOPRIO_WHO_USER,
};

That means:

  • which = IOPRIO_WHO_PROCESS = 1 to select the "process" mode for the who argument
  • who = 0 to modify I/O priority of the current process
  • ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0) = 3 << 13 to select "idle" priority, which is the lowest priority level.

Incidentally, the which and who values behave the same as for the setpriority() syscall which controls CPU priority ("nice" value) – but that relationship is not documented anywhere.

Assembling the Python code

In the end, my Python function to drop CPU + IO priority looks like this:

import ctypes

def drop_priority():
    """Drop Linux scheduling priority (CPU + IO) as far as possible."""

    syscall = get_syscalls()
    libc = ctypes.CDLL("libc.so.6")

    which = 1  # select a process, see linux/ioprio.h
    who = 0  # current process

    # CPU scheduling priority:
    # high value -> very low priority
    prio = 19

    # IO scheduling priority: ioclass=IDLE, see linux/ioprio.h
    ioprio = 3 << 13

    result = libc.setpriority(which, who, prio)
    assert result == 0, "setpriority() failed"

    result = libc.syscall(syscall["ioprio_set"], which, who, ioprio)
    assert result == 0, "ioprio_set() failed"

There's a package for that

Some time later, it occurred to me that I could not have been the first person to attempt this - but it can be difficult to find a package if you don't already know what exactly you're looking for.

In this case, the solution was the psutil Python package (psutil on GitHub, psutil on PyPI, ionice() documentation). With it, the code simplifies quite a bit:

import psutil

def drop_priority() -> None:
    """Drop Linux scheduling priority (CPU + IO) as far as possible."""

    p = psutil.Process()

    # CPU scheduling priority:
    # high value -> very low priority
    nice = p.nice(19)

    # IO scheduling priority:
    # Set ioclass=IDLE even though that might starve the process.
    ionice = p.ionice(ioclass=psutil.IOPRIO_CLASS_IDLE)

But how does it work? The answer is arguably a bit boring: the psutil Python package is implemented as a C extension, so it has access to the headers with the syscall numbers. But it does re-implement the IOPRIO macros, avoiding the need to have the Linux headers installed locally.

Excerpt from the code in psutil/arch/linux/proc.c

/*
 * Copyright (c) 2009, Giampaolo Rodola'. All rights reserved.
 * Use of this source code is governed by a BSD-style license that can be
 * found in the LICENSE file.
 */

/* ... */
#include <sys/syscall.h>
/* ... */

static inline int
ioprio_set(int which, int who, int ioprio) {
    return syscall(__NR_ioprio_set, which, who, ioprio);
}

#define IOPRIO_CLASS_SHIFT 13
#define IOPRIO_PRIO_MASK ((1UL << IOPRIO_CLASS_SHIFT) - 1)
/* ... */
#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)

While it is fun that we could solve this ourselves with Python-only code, it's definitively more convenient if someone else has gone through the effort of packaging this up in a convenient Python API.