I prepared two bionic instances to run over the weekend. One is running auditd from bionic, and the other is running the SRU proposed package.
I have auditd being restarted via this script in both (just the email message is different, to say which package it was): #!/bin/bash result=0 while /bin/true; do date sudo systemctl restart auditd || result=$? if [ "$result" -ne "0" ]; then echo "FAILED, result=$result" break fi pid=$(pidof auditd) || result=$? if [ "$result" -ne "0" ]; then echo "FAILED, auditd not running" break fi echo "auditd pid = $pid" sleep 2 echo done mail -s "ALERT: audit orig test failed" andr...@canonical.com <<EOF $(date) Hello, the audit test failed with the orig package result = $result EOF The one running the SRU version hasn't failed yet, but the one running the original auditd pkg from bionic has failed after a few hours, with the error message that this bug is addressing. I saved the logs, and kicked another run, just to try to get another failure. If it fails again, I'll update the package there to the SRU one, and let it run again. I'll attach logs and a summary after this experiment is over, hopefully later today if the failure repeats itself in the same amount of time. Here is a glimpse: Fri Jan 22 19:20:29 UTC 2021 auditd pid = 25215 Fri Jan 22 19:20:31 UTC 2021 auditd pid = 25255 Fri Jan 22 19:20:33 UTC 2021 auditd pid = 25334 ... Fri Jan 22 22:43:47 UTC 2021 auditd pid = 23985 Fri Jan 22 22:43:49 UTC 2021 auditd pid = 24022 Fri Jan 22 22:43:51 UTC 2021 Job for auditd.service failed because a timeout was exceeded. See "systemctl status auditd.service" and "journalctl -xe" for details. FAILED, result=1 And from syslog: ... Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopping Security Auditing Service... Jan 22 22:43:51 orig-audit-bionic auditd[24022]: The audit daemon is exiting. Jan 22 22:43:51 orig-audit-bionic kernel: [13955.899540] audit: type=1305 audit(1611355431.494:81546): audit_pid=0 old=24022 auid=4294967295 ses=4294967295 res=1 Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopped Security Auditing Service. Jan 22 22:43:51 orig-audit-bionic kernel: [13955.901464] audit: type=1131 audit(1611355431.498:81547): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=auditd comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 22 22:43:51 orig-audit-bionic systemd[1]: Starting Security Auditing Service... Jan 22 22:43:51 orig-audit-bionic auditd[24058]: Started dispatcher: /sbin/audispd pid: 24060 Jan 22 22:43:51 orig-audit-bionic audispd: No plugins found, exiting Jan 22 22:45:21 orig-audit-bionic systemd[1]: auditd.service: Start operation timed out. Terminating. Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing. Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 24057 (auditd) with signal SIGKILL. Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 24058 (auditd) with signal SIGKILL. Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Control process exited, code=killed status=9 Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Failed with result 'timeout'. We can see that the message "dispatcher <pid> reaped" isn't shown, which is exactly the bug: auditd hangs while trying to log that message inside a signal handler. So, looking good. Let's see if I can get another failure. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to audit in Ubuntu. https://bugs.launchpad.net/bugs/1848330 Title: Installing auditd sometimes fails in post-inst Status in audit package in Ubuntu: Fix Released Status in audit source package in Bionic: Fix Committed Status in audit package in Debian: New Bug description: [Impact] Sometimes, auditd will get stuck when starting up, causing systemd to kill it after a while since it (systemd) never got the start notification. Upstream troubleshooted this to be caused by calling a syslog() function inside a signal handler. [Test Case] There is no reliable test case to reproduce the bug, other than trying the fixed packages on an affected system where the hang occurs more frequently. Basically: sudo systemctl stop auditd sudo systemctl start auditd should work reliably. Do not run that in a tight loop, however, as that will trigger a it's-restarting-too-frequently failure. [Where problems could occur] - if auditd fails to start, then the first fallback is syslog, and if that is not picking up the audit messages, the last resort is the kernel buffer, which can fill up. In the case it fills up, audit logs will be lost. - it's possible to configure the audit system to panic() the machine if audit messages are lost or otherwise not able to be recorded (auditctl -f 2; default is 1 which is printk()) - the update restarts auditd as expected. Misconfiguration on very very busy systems could mean that audit logs would be lost during the brief moment the service is restarted. If that's the case, this update would just be one more way to trigger it, but not be the root cause of the problem - similarly, as is usual with updates that restart services, it's possible than an incorrect configuration for auditd is present, but was never loaded before. The restart will load the config, and will fail in such a case. - this update removes a logging statement that occurs during startup: ("dispatcher %d reaped", pid) It's unlikely, but possible, that some monitoring software could be looking for that message in the logs. It won't be there anymore after this update. [Other Info] The patch is committed upstream and part of the 2.8.5 release, which is present in Focal and later. The real fix for this bug is just dropping the audit_msg() call in the signal handler code. But the original reporter of the bug, who is also who came up with the fix (see https://bugzilla.redhat.com/show_bug.cgi?id=1587995#c4) stated that with the 3 changes in the patch the startup hang didn't happen to him anymore. Since this bug is difficult to reproduce elsewhere (either you have it, or you don't), I chose to keep the 3 changes instead of just the removal of the audit_msg() call. [Original Description] This happens sometimes when installing auditd on Ubuntu 18.04.2, most installations work successfully, though. Re-running the install also fixes the issue, but the failure breaks our automation. The log from the failure looks like this: # apt install auditd ... Setting up auditd (1:2.8.2-1ubuntu1) ... Created symlink /etc/systemd/system/multi-user.target.wants/auditd.service → /lib/systemd/system/auditd.service. Job for auditd.service failed because a timeout was exceeded. See "systemctl status auditd.service" and "journalctl -xe" for details. invoke-rc.d: initscript auditd, action "start" failed. ● auditd.service - Security Auditing Service Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor preset: enabled) Active: failed (Result: timeout) since Tue 2019-09-17 18:43:06 UTC; 11ms ago Docs: man:auditd(8) https://github.com/linux-audit/audit-documentation Process: 9702 ExecStart=/sbin/auditd (code=killed, signal=KILL) Sep 17 18:40:06 compute-node21 systemd[1]: Starting Security Auditing Service... Sep 17 18:40:06 compute-node21 auditd[9703]: Started dispatcher: /sbin/audispd pid: 9705 Sep 17 18:40:06 compute-node21 audispd[9705]: No plugins found, exiting Sep 17 18:41:36 compute-node21 systemd[1]: auditd.service: Start operation timed out. Terminating. Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing. Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 9702 (auditd) with signal SIGKILL. Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 9703 (auditd) with signal SIGKILL. Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Control process exited, code=killed status=9 Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Failed with result 'timeout'. Sep 17 18:43:06 compute-node21 systemd[1]: Failed to start Security Auditing Service. dpkg: error processing package auditd (--configure): installed auditd package post-installation script subprocess returned error exit status 1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/audit/+bug/1848330/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp