[Bug 1896575] Comment bridged from LTC Bugzilla

bugproxy Thu, 22 Oct 2020 09:18:16 -0700

------- Comment From mario.alberto.gali...@ibm.com 2020-10-22 11:54 EDT-------
(In reply to comment #44)
> Thanks for all the attachments. This is Rafael from the Ubuntu Server team.
> I have gone through all the logs and I could find the I/O errors related to
> disk full (/var/log/syslog cannot be write) but that was a long time ago and
> I'm assuming this is not the case here.
>
> There are 2 (possible unrelated) issues I could see:
>
> 1) lots of CDB inquiry SCSI errors:
>
> sd 2:0:0:4: [sdj] tag#76 CDB: Inquiry 12 01 c9 00 fe 00
> sd 2:0:0:4: [sdj] tag#76 Sense Key : Illegal Request [current]
> sd 2:0:0:4: [sdj] tag#76 Add. Sense: Invalid field in cdb
> sd 2:0:0:5: [sdl] tag#65 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> sd 2:0:0:5: [sdl] tag#65 CDB: Inquiry 12 01 c9 00 fe 00
> sd 2:0:0:5: [sdl] tag#65 Sense Key : Illegal Request [current]
> sd 2:0:0:5: [sdl] tag#65 Add. Sense: Invalid field in cdb
> sd 3:0:0:1: [sdr] tag#108 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> sd 3:0:0:1: [sdr] tag#108 CDB: Inquiry 12 01 c9 00 fe 00
> sd 3:0:0:1: [sdr] tag#108 Sense Key : Illegal Request [current]
> sd 3:0:0:1: [sdr] tag#108 Add. Sense: Invalid field in cdb
> sd 3:0:0:2: [sdt] tag#96 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> sd 3:0:0:2: [sdt] tag#96 CDB: Inquiry 12 01 c9 00 fe 00
> sd 3:0:0:2: [sdt] tag#96 Sense Key : Illegal Request [current]
> sd 3:0:0:2: [sdt] tag#96 Add. Sense: Invalid field in cdb
> sd 3:0:0:3: [sdu] tag#103 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> sd 3:0:0:3: [sdu] tag#103 CDB: Inquiry 12 01 c9 00 fe 00
>
> likely because storage server is not accepting those inquiries on non
> existent LUNs (were they removed from the system ? The storage server admin
> can tell you why your INQ CDB was denied.. but, since this is an INQ command
> AND there are no further I/O errors, I think it is safe to consider this as
> non-fatal and unrelated to this bug as well.
>
> 2) I have tried to reproduce the issue locally but was not able to. I'm
> sorry for the back and forth, if we could reproduce it here it would way
> faster. With that said, it is unclear to me if have you tried editing
> "/usr/lib/rsyslog/rsyslog-rotate" and changing:
>
> systemctl kill -s HUP rsyslog.service
>
> for
>
> systemctl restart rsyslog.service
>
> to see if this mitigates the issue ? That will help isolate the problem and
> allow us to know if the problem is related to the HUP signal handling
> feature in rsyslog (which is responsible for cleaning up opened file
> descriptors and might be causing this in some situation). By always
> restarting the service we will do a full initialization of descriptors and
> could be a good indicator if that is the problem.
>
> That is the suggestion right now because from all the log files, the only
> messages coming out of rsyslog were related to the HUP signal handler. This
> may also be a "hotfix" if it works, because there are some fixes in between
> the rsyslog you're using and the latest, one of them being:
>
> commit 723f6fdfa
> Author: John Brooks <jbro...@ciena.com>
> Date:   Wed Jul 3 15:10:30 2019
>
> rsyslogd: Fix race between signals and main loop timeout
>
> The main loop sleeps in a select() call for a long interval in order to
> periodically run housekeeping tasks. The main loop is also responsible for
> responding to flags set by signal handlers, so this sleeping should be
> interrupted by signals so that it can check those flags.
>
> However, a signal could be delivered between when the flags are checked and
> when select() is called. In which case the main loop will block for the
> full interval (currently 10 minutes) before handling the signal. If this
> occurs, it could take up to 10 minutes for rsyslogd to terminate after a
> SIGTERM or respond to SIGHUP.
>
> Fix this by blocking signals before checking the flags and using pselect()
> to unblock the signals while waiting. This is recommended by the select(2)
> manual page to avoid this very issue.
>
> Signed-off-by: John Brooks <jbro...@ciena.com>
>
> among others.


The CDB inquiry SCSI errors are for different devices to the one that is
being used to store /var/log directory in both systems:

iSCSI system:
dasda                                94:0    0  21.4G  0 disk
??dasda1                             94:1    0     1G  0 part  /boot
??dasda2                             94:2    0  20.4G  0 part
??ubuntu--vg-ubuntu--lv           253:0    0  20.4G  0 lvm   /

FCP/DASD system:
sdai                                         66:32   0   50G  0 disk
??sdai1                                      66:33   0    1G  0 part
??sdai2                                      66:34   0   49G  0 part
??36005076400818089b0000000000001a6         253:8    0   50G  0 mpath
??36005076400818089b0000000000001a6-part1 253:10   0    1G  0 part  /boot
??36005076400818089b0000000000001a6-part2 253:11   0   49G  0 part
??ubuntu--vg-ubuntu--lv                 253:12   0   34G  0 lvm   /

As suggested I just updated the rsyslog-rotate file, and restarted the
service, will update if that changes helps to mitigate the problem.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1896575

Title:
  [UBUNTU 20.04] syslog daemon stop running unexpectedly

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896575/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1896575] Comment bridged from LTC Bugzilla

Reply via email to