** Attachment added: "test.gdb" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1876600/+attachment/5366780/+files/test.gdb
** Description changed: [Impact] Long-running services overflow the sd_bus->cookie counter, causing further communication with org.freedesktop.systemd1 to stall. [Description] Systemd dbus messages include a "cookie" value to uniquely identify them in their bus context. This value is obtained from the bus header, and incremented for each exchanged message in the same bus object. For services that run for longer periods of time and keep communicating through dbus, it's possible to overflow the cookie value, causing further messages to the org.freedesktop.systemd1 dbus to fail. This can lead to these services becoming unresponsive, as they get stuck trying to communicate with invalid bus cookie values. This issue has been fixed upstream by the commit below: - sd-bus: deal with cookie overruns (1f82f5bb4237) $ git describe --contains 1f82f5bb4237 v242-rc1~228 $ rmadison systemd - systemd | 229-4ubuntu4 | xenial | source, ... - systemd | 229-4ubuntu21.27 | xenial-security | source, ... - systemd | 229-4ubuntu21.27 | xenial-updates | source, ... - systemd | 229-4ubuntu21.28 | xenial-proposed | source, ... - systemd | 237-3ubuntu10 | bionic | source, ... - systemd | 237-3ubuntu10.38 | bionic-security | source, ... - systemd | 237-3ubuntu10.39 | bionic-updates | source, ... - systemd | 237-3ubuntu10.40 | bionic-proposed | source, ... <---- - systemd | 242-7ubuntu3 | eoan | source, ... + systemd | 229-4ubuntu4 | xenial | source, ... + systemd | 229-4ubuntu21.27 | xenial-security | source, ... + systemd | 229-4ubuntu21.27 | xenial-updates | source, ... + systemd | 229-4ubuntu21.28 | xenial-proposed | source, ... + systemd | 237-3ubuntu10 | bionic | source, ... + systemd | 237-3ubuntu10.38 | bionic-security | source, ... + systemd | 237-3ubuntu10.39 | bionic-updates | source, ... + systemd | 237-3ubuntu10.40 | bionic-proposed | source, ... <---- + systemd | 242-7ubuntu3 | eoan | source, ... Releases starting with Eoan already have this fix. [Test Case] There doesn't seem to be an easy test case for this, as the cookie values start at zero and won't overflow until (1<<32). There have been reports from users hitting this on Kubernetes clusters continuously running for longer periods (~5 months). Using GDB, we can construct an artificial test case to test the cookie overflow. The test case below performs the following steps: 1. Create a new system bus object through sd_bus_default_system() 2. Allocate and append a new method_call message to the bus 3. Send the message through sd_bus_call() 4. Handle the response message and free up the message objects - This is done continuously, to keep incrementing the bus cookie value. We step in with GDB when it reaches 0x10000, and set its value to 0xffffff00 which then causes the test program to fail shortly afterwards. An example test run of an impacted system: + It's essentially the example code from the + sd_bus_message_new_method_call() manpage, with minor modifications: this + is done continuously, to keep incrementing the bus cookie value. We step + in with GDB when it reaches 0x10000, and set its value to 0xffffff00 + which then causes the test program to fail shortly afterwards. An + example test run of an impacted system: + ubuntu@bionic:~$ gcc -Wall test.c -o cookie -lsystemd -g ubuntu@bionic:~$ gdb --batch --command=test.gdb --args ./cookie Breakpoint 1 at 0xe61: file test.c, line 38. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". (16s) cookie: 0x00010000 reply-cookie: 0x00010000 Breakpoint 1, print_unit_path (bus=0x555555757290) at test.c:38 38 r = sd_bus_message_new_method_call(bus, &m, $1 = 0x10000 $2 = 0xffffff00 Call failed: Operation not supported Sleeping and retrying... Call failed: Invalid argument Assertion 'm->n_ref > 0' failed at ../src/libsystemd/sd-bus/bus-message.c:934, function sd_bus_message_unref(). Aborting. Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=0x6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. u To compile and debug the test case above, libsystemd-dev and libsystemd0-dbgsym are required. Both test.c and test.gdb source code are attached to this LP bug. [Regression Potential] This fix introduces some changes in the way cookie incrementation is handled. We now have a reduced number of available values, since the patch makes use of a high order bit to indicate whether we have overflowed or not. Potential issues could arise from two distinct messages repeating the cookie value, or from us not handling the cookie reuse properly. In practice, this shouldn't cause serious problems as most dbus messages should not stall long enough for a possible overlap in the 2^31 space. The patch has been present in other stable Ubuntu Series and upstream, and has been validated and tested through the systemd test suite and autopkgtests. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1876600 Title: cookie overruns can cause org.freedesktop.systemd1 dbus to hang To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1876600/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs