I prepared two bionic instances to run over the weekend.

One is running auditd from bionic, and the other is running the SRU
proposed package.

I have auditd being restarted via this script in both (just the email message 
is different, to say which package it was):
#!/bin/bash

result=0

while /bin/true; do
    date
    sudo systemctl restart auditd || result=$?
    if [ "$result" -ne "0" ]; then
        echo "FAILED, result=$result"
        break
    fi
    pid=$(pidof auditd) || result=$?
    if [ "$result" -ne "0" ]; then
        echo "FAILED, auditd not running"
        break
    fi
    echo "auditd pid = $pid"
    sleep 2
    echo
done
mail -s "ALERT: audit orig test failed" andr...@canonical.com <<EOF
$(date)
Hello, the audit test failed with the orig package
result = $result
EOF

The one running the SRU version hasn't failed yet, but the one running
the original auditd pkg from bionic has failed after a few hours, with
the error message that this bug is addressing. I saved the logs, and
kicked another run, just to try to get another failure. If it fails
again, I'll update the package there to the SRU one, and let it run
again.

I'll attach logs and a summary after this experiment is over, hopefully
later today if the failure repeats itself in the same amount of time.

Here is a glimpse:
Fri Jan 22 19:20:29 UTC 2021
auditd pid = 25215

Fri Jan 22 19:20:31 UTC 2021
auditd pid = 25255

Fri Jan 22 19:20:33 UTC 2021
auditd pid = 25334
...
Fri Jan 22 22:43:47 UTC 2021
auditd pid = 23985

Fri Jan 22 22:43:49 UTC 2021
auditd pid = 24022

Fri Jan 22 22:43:51 UTC 2021
Job for auditd.service failed because a timeout was exceeded.
See "systemctl status auditd.service" and "journalctl -xe" for details.
FAILED, result=1


And from syslog:
...
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopping Security Auditing 
Service...
Jan 22 22:43:51 orig-audit-bionic auditd[24022]: The audit daemon is exiting.
Jan 22 22:43:51 orig-audit-bionic kernel: [13955.899540] audit: type=1305 
audit(1611355431.494:81546): audit_pid=0 old=24022 auid=4294967295 
ses=4294967295 res=1
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopped Security Auditing Service.
Jan 22 22:43:51 orig-audit-bionic kernel: [13955.901464] audit: type=1131 
audit(1611355431.498:81547): pid=1 uid=0 auid=4294967295 ses=4294967295 
msg='unit=auditd comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? 
terminal=? res=success'
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Starting Security Auditing 
Service...
Jan 22 22:43:51 orig-audit-bionic auditd[24058]: Started dispatcher: 
/sbin/audispd pid: 24060
Jan 22 22:43:51 orig-audit-bionic audispd: No plugins found, exiting
Jan 22 22:45:21 orig-audit-bionic systemd[1]: auditd.service: Start operation 
timed out. Terminating.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: State 
'stop-sigterm' timed out. Killing.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 
24057 (auditd) with signal SIGKILL.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 
24058 (auditd) with signal SIGKILL.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Control process 
exited, code=killed status=9
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Failed with 
result 'timeout'.

We can see that the message "dispatcher <pid> reaped" isn't shown, which
is exactly the bug: auditd hangs while trying to log that message inside
a signal handler.

So, looking good. Let's see if I can get another failure.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to audit in Ubuntu.
https://bugs.launchpad.net/bugs/1848330

Title:
  Installing auditd sometimes fails in post-inst

Status in audit package in Ubuntu:
  Fix Released
Status in audit source package in Bionic:
  Fix Committed
Status in audit package in Debian:
  New

Bug description:
  [Impact]

  Sometimes, auditd will get stuck when starting up, causing systemd to
  kill it after a while since it (systemd) never got the start
  notification.

  Upstream troubleshooted this to be caused by calling a syslog()
  function inside a signal handler.

  [Test Case]
  There is no reliable test case to reproduce the bug, other than trying the 
fixed packages on an affected system where the hang occurs more frequently.

  Basically:
  sudo systemctl stop auditd
  sudo systemctl start auditd

  should work reliably. Do not run that in a tight loop, however, as
  that will trigger a it's-restarting-too-frequently failure.

  [Where problems could occur]
  - if auditd fails to start, then the first fallback is syslog, and if that is 
not picking up the audit messages, the last resort is the kernel buffer, which 
can fill up. In the case it fills up, audit logs will be lost.

  - it's possible to configure the audit system to panic() the machine
  if audit messages are lost or otherwise not able to be recorded
  (auditctl -f 2; default is 1 which is printk())

  - the update restarts auditd as expected. Misconfiguration on very
  very busy systems could mean that audit logs would be lost during the
  brief moment the service is restarted. If that's the case, this update
  would just be one more way to trigger it, but not be the root cause of
  the problem

  - similarly, as is usual with updates that restart services, it's
  possible than an incorrect configuration for auditd is present, but
  was never loaded before. The restart will load the config, and will
  fail in such a case.

  - this update removes a logging statement that occurs during startup:

      ("dispatcher %d reaped", pid)

  It's unlikely, but possible, that some monitoring software could be
  looking for that message in the logs. It won't be there anymore after
  this update.

  [Other Info]
  The patch is committed upstream and part of the 2.8.5 release, which is 
present in Focal and later.
  The real fix for this bug is just dropping the audit_msg() call in the signal 
handler code. But the original reporter of the bug, who is also who came up 
with the fix (see https://bugzilla.redhat.com/show_bug.cgi?id=1587995#c4) 
stated that with the 3 changes in the patch the startup hang didn't happen to 
him anymore. Since this bug is difficult to reproduce elsewhere (either you 
have it, or you don't), I chose to keep the 3 changes instead of just the 
removal of the audit_msg() call.

  [Original Description]

  This happens sometimes when installing auditd on Ubuntu 18.04.2, most
  installations work successfully, though. Re-running the install also
  fixes the issue, but the failure breaks our automation. The log from
  the failure looks like this:

  # apt install auditd
  ...
  Setting up auditd (1:2.8.2-1ubuntu1) ...
  Created symlink /etc/systemd/system/multi-user.target.wants/auditd.service → 
/lib/systemd/system/auditd.service.
  Job for auditd.service failed because a timeout was exceeded.
  See "systemctl status auditd.service" and "journalctl -xe" for details.
  invoke-rc.d: initscript auditd, action "start" failed.
  ● auditd.service - Security Auditing Service
     Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor 
preset: enabled)
     Active: failed (Result: timeout) since Tue 2019-09-17 18:43:06 UTC; 11ms 
ago
       Docs: man:auditd(8)
             https://github.com/linux-audit/audit-documentation
    Process: 9702 ExecStart=/sbin/auditd (code=killed, signal=KILL)

  Sep 17 18:40:06 compute-node21 systemd[1]: Starting Security Auditing 
Service...
  Sep 17 18:40:06 compute-node21 auditd[9703]: Started dispatcher: 
/sbin/audispd pid: 9705
  Sep 17 18:40:06 compute-node21 audispd[9705]: No plugins found, exiting
  Sep 17 18:41:36 compute-node21 systemd[1]: auditd.service: Start operation 
timed out. Terminating.
  Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: State 
'stop-sigterm' timed out. Killing.
  Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 
9702 (auditd) with signal SIGKILL.
  Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 
9703 (auditd) with signal SIGKILL.
  Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Control process 
exited, code=killed status=9
  Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Failed with result 
'timeout'.
  Sep 17 18:43:06 compute-node21 systemd[1]: Failed to start Security Auditing 
Service.
  dpkg: error processing package auditd (--configure):
   installed auditd package post-installation script subprocess returned error 
exit status 1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/audit/+bug/1848330/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to