My suspicion here is that the RA is messing up the monitoring action. I'd suggest trying with just one of the drbd clones and see if that works.
On Wed, Nov 12, 2008 at 13:19, Raoul Bhatia [IPAX] <[EMAIL PROTECTED]> wrote: > hi, > > i have a cluster with several resources. > > i issued crm_resource -P and now have got the cluster in some strange > state, which it cannot resolve by itself: > >> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby >> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby > ... >> Master/Slave Set: ms_drbd_www >> drbd_www:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >> drbd_www:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] > ... >> Master/Slave Set: ms_drbd_mysql >> drbd_mysql:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >> drbd_mysql:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] > > failed actions: >> Failed actions: >> drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete >> drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete >> drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete >> drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete > > those monitoring failures repeat continouesly. in the logfiles i find: > ... >> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 >> (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error >> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: >> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, >> id=drbd_www:0_monitor_0, >> magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed >> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority >> upgraded from 0 to 1 >> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action >> done superceeded by restart >> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action >> drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4) >> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 >> (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error >> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: >> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, >> id=drbd_www:1_monitor_0, >> magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed >> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action >> drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4) > ... > > i put some debug information into the drbd ocf ra: >> #!/bin/sh >> echo "----" >> /tmp/lalala > > but /tmp/lalala stays emtpy. if i manually call the drbd ra with > all parameters i get the expected rc 8. > > hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz > (its kinda big as a lot of actions failed) > > cheers, > raoul > > ps: i allready tried to revoke the crm_standby, but this does not > resolve the error messages and does not call the drbd ocf ra. > -- > ____________________________________________________________________ > DI (FH) Raoul Bhatia M.Sc. email. [EMAIL PROTECTED] > Technischer Leiter > > IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at > Barawitzkagasse 10/2/2/11 email. [EMAIL PROTECTED] > 1190 Wien tel. +43 1 3670030 > FN 277995t HG Wien fax. +43 1 3670030 15 > ____________________________________________________________________ > > _______________________________________________ > Pacemaker mailing list > [email protected] > http://list.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
