------- Comment From mario.alberto.gali...@ibm.com 2020-10-22 11:54 EDT------- (In reply to comment #44) > Thanks for all the attachments. This is Rafael from the Ubuntu Server team. > I have gone through all the logs and I could find the I/O errors related to > disk full (/var/log/syslog cannot be write) but that was a long time ago and > I'm assuming this is not the case here. > > There are 2 (possible unrelated) issues I could see: > > 1) lots of CDB inquiry SCSI errors: > > sd 2:0:0:4: [sdj] tag#76 CDB: Inquiry 12 01 c9 00 fe 00 > sd 2:0:0:4: [sdj] tag#76 Sense Key : Illegal Request [current] > sd 2:0:0:4: [sdj] tag#76 Add. Sense: Invalid field in cdb > sd 2:0:0:5: [sdl] tag#65 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE > driverbyte=DRIVER_OK > sd 2:0:0:5: [sdl] tag#65 CDB: Inquiry 12 01 c9 00 fe 00 > sd 2:0:0:5: [sdl] tag#65 Sense Key : Illegal Request [current] > sd 2:0:0:5: [sdl] tag#65 Add. Sense: Invalid field in cdb > sd 3:0:0:1: [sdr] tag#108 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE > driverbyte=DRIVER_OK > sd 3:0:0:1: [sdr] tag#108 CDB: Inquiry 12 01 c9 00 fe 00 > sd 3:0:0:1: [sdr] tag#108 Sense Key : Illegal Request [current] > sd 3:0:0:1: [sdr] tag#108 Add. Sense: Invalid field in cdb > sd 3:0:0:2: [sdt] tag#96 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE > driverbyte=DRIVER_OK > sd 3:0:0:2: [sdt] tag#96 CDB: Inquiry 12 01 c9 00 fe 00 > sd 3:0:0:2: [sdt] tag#96 Sense Key : Illegal Request [current] > sd 3:0:0:2: [sdt] tag#96 Add. Sense: Invalid field in cdb > sd 3:0:0:3: [sdu] tag#103 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE > driverbyte=DRIVER_OK > sd 3:0:0:3: [sdu] tag#103 CDB: Inquiry 12 01 c9 00 fe 00 > > likely because storage server is not accepting those inquiries on non > existent LUNs (were they removed from the system ? The storage server admin > can tell you why your INQ CDB was denied.. but, since this is an INQ command > AND there are no further I/O errors, I think it is safe to consider this as > non-fatal and unrelated to this bug as well. > > 2) I have tried to reproduce the issue locally but was not able to. I'm > sorry for the back and forth, if we could reproduce it here it would way > faster. With that said, it is unclear to me if have you tried editing > "/usr/lib/rsyslog/rsyslog-rotate" and changing: > > systemctl kill -s HUP rsyslog.service > > for > > systemctl restart rsyslog.service > > to see if this mitigates the issue ? That will help isolate the problem and > allow us to know if the problem is related to the HUP signal handling > feature in rsyslog (which is responsible for cleaning up opened file > descriptors and might be causing this in some situation). By always > restarting the service we will do a full initialization of descriptors and > could be a good indicator if that is the problem. > > That is the suggestion right now because from all the log files, the only > messages coming out of rsyslog were related to the HUP signal handler. This > may also be a "hotfix" if it works, because there are some fixes in between > the rsyslog you're using and the latest, one of them being: > > commit 723f6fdfa > Author: John Brooks <jbro...@ciena.com> > Date: Wed Jul 3 15:10:30 2019 > > rsyslogd: Fix race between signals and main loop timeout > > The main loop sleeps in a select() call for a long interval in order to > periodically run housekeeping tasks. The main loop is also responsible for > responding to flags set by signal handlers, so this sleeping should be > interrupted by signals so that it can check those flags. > > However, a signal could be delivered between when the flags are checked and > when select() is called. In which case the main loop will block for the > full interval (currently 10 minutes) before handling the signal. If this > occurs, it could take up to 10 minutes for rsyslogd to terminate after a > SIGTERM or respond to SIGHUP. > > Fix this by blocking signals before checking the flags and using pselect() > to unblock the signals while waiting. This is recommended by the select(2) > manual page to avoid this very issue. > > Signed-off-by: John Brooks <jbro...@ciena.com> > > among others.
The CDB inquiry SCSI errors are for different devices to the one that is being used to store /var/log directory in both systems: iSCSI system: dasda 94:0 0 21.4G 0 disk ??dasda1 94:1 0 1G 0 part /boot ??dasda2 94:2 0 20.4G 0 part ??ubuntu--vg-ubuntu--lv 253:0 0 20.4G 0 lvm / FCP/DASD system: sdai 66:32 0 50G 0 disk ??sdai1 66:33 0 1G 0 part ??sdai2 66:34 0 49G 0 part ??36005076400818089b0000000000001a6 253:8 0 50G 0 mpath ??36005076400818089b0000000000001a6-part1 253:10 0 1G 0 part /boot ??36005076400818089b0000000000001a6-part2 253:11 0 49G 0 part ??ubuntu--vg-ubuntu--lv 253:12 0 34G 0 lvm / As suggested I just updated the rsyslog-rotate file, and restarted the service, will update if that changes helps to mitigate the problem. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896575 Title: [UBUNTU 20.04] syslog daemon stop running unexpectedly To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896575/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs