[ClusterLabs] centos 7 drbd fubar

Dimitri Maziuk Tue, 27 Dec 2016 13:13:56 -0800

I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap
active-passive pair locked up again. This has now been consistent for at
least 3 kernel updates. This time I had enough consoles open to run
fuser & lsof though.


The procedure:

1. pcs cluster standby <secondary>
2. yum up && reboot <secondary>
3. pcs cluster unstandby <secondary>

Fine so far.

4. pcs cluster standby <primary>
results in

> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Running 
> stop for /dev/drbd0 on /raid
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Trying to 
> unmount /raid
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:45 ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:46 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:48 ERROR: Couldn't 
> unmount /raid, giving up!
> Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu       lrmd:   notice: 
> operation_finished:        drbd_filesystem_stop_0:18277:stderr [ umount: 
> /raid: target i
> s busy. ]

... until the system's powered down. Before power down I ran lsof, it
hung, and fuser:

> # fuser -vum /raid
>                      USER        PID ACCESS COMMAND
> /raid:               root     kernel mount (root)/raid

After running yum up on the primary and rebooting it again,

5. pcs cluster unstandby <primary>
causes the same fail to unmount loop on the secondary, that has to be
powered down until the primary recovers.

Hopefully I'm doing something wrong, please someone tell me what it is.
Anyone? Bueller?
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] centos 7 drbd fubar

Reply via email to