Re: [Pacemaker] monitoring action fails

Andrew Beekhof Wed, 19 Nov 2008 05:26:02 -0800

My suspicion here is that the RA is messing up the monitoring action.
I'd suggest trying with just one of the drbd clones and see if that works.


On Wed, Nov 12, 2008 at 13:19, Raoul Bhatia [IPAX] <[EMAIL PROTECTED]> wrote:
> hi,
>
> i have a cluster with several resources.
>
> i issued crm_resource -P and now have got the cluster in some strange
> state, which it cannot resolve by itself:
>
>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby
> ...
>> Master/Slave Set: ms_drbd_www
>>     drbd_www:0  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>>     drbd_www:1  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
> ...
>> Master/Slave Set: ms_drbd_mysql
>>     drbd_mysql:0        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>>     drbd_mysql:1        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>
> failed actions:
>> Failed actions:
>>     drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete
>>     drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete
>>     drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete
>>     drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete
>
> those monitoring failures repeat continouesly. in the logfiles i find:
> ...
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 
>> (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: 
>> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, 
>> id=drbd_www:0_monitor_0, 
>> magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority 
>> upgraded from 0 to 1
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action 
>> done superceeded by restart
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action 
>> drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4)
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 
>> (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: 
>> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, 
>> id=drbd_www:1_monitor_0, 
>> magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action 
>> drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4)
> ...
>
> i put some debug information into the drbd ocf ra:
>> #!/bin/sh
>> echo "----" >> /tmp/lalala
>
> but /tmp/lalala stays emtpy. if i manually call the drbd ra with
> all parameters i get the expected rc 8.
>
> hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz
> (its kinda big as a lot of actions failed)
>
> cheers,
> raoul
>
> ps: i allready tried to revoke the crm_standby, but this does not
> resolve the error messages and does not call the drbd ocf ra.
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc.          email.          [EMAIL PROTECTED]
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            [EMAIL PROTECTED]
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> ____________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list
> [email protected]
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] monitoring action fails

Reply via email to