I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap active-passive pair locked up again. This has now been consistent for at least 3 kernel updates. This time I had enough consoles open to run fuser & lsof though.
The procedure: 1. pcs cluster standby <secondary> 2. yum up && reboot <secondary> 3. pcs cluster unstandby <secondary> Fine so far. 4. pcs cluster standby <primary> results in > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Running > stop for /dev/drbd0 on /raid > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Trying to > unmount /raid > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 ERROR: Couldn't > unmount /raid; trying cleanup with TERM > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 ERROR: Couldn't > unmount /raid; trying cleanup with TERM > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 ERROR: Couldn't > unmount /raid; trying cleanup with TERM > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 ERROR: Couldn't > unmount /raid; trying cleanup with KILL > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:45 ERROR: Couldn't > unmount /raid; trying cleanup with KILL > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:46 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 ERROR: Couldn't > unmount /raid; trying cleanup with KILL > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 INFO: No > processes on /raid were signalled. force_unmount is set to 'yes' > Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:48 ERROR: Couldn't > unmount /raid, giving up! > Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu lrmd: notice: > operation_finished: drbd_filesystem_stop_0:18277:stderr [ umount: > /raid: target i > s busy. ] ... until the system's powered down. Before power down I ran lsof, it hung, and fuser: > # fuser -vum /raid > USER PID ACCESS COMMAND > /raid: root kernel mount (root)/raid After running yum up on the primary and rebooting it again, 5. pcs cluster unstandby <primary> causes the same fail to unmount loop on the secondary, that has to be powered down until the primary recovers. Hopefully I'm doing something wrong, please someone tell me what it is. Anyone? Bueller? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
