Hi, On Thu, Oct 16, 2008 at 11:45:07AM +0200, Raoul Bhatia [IPAX] wrote: > hi, > > i wanted to stop heartbeat to update install a new cib.xml. > i (hopefully) killed (!) all relevant processes on node2 (wc02) > and issued "/etc/init.d/heartbeat stop" on wc01. > > all resources stopped fine. all, with the exception of "stonith": > > Clone Set: DoFencing > > stonith_rackpdu:0 (stonith:external/rackpdu): Started wc01 > > stonith_rackpdu:1 (stonith:external/rackpdu): Stopped > > > looking into the logfile, i find > > ct 16 11:40:53 wc01 pengine: [4617]: WARN: process_pe_message: Transition > > 359: WARNINGs found during PE processing. PEngine Input stored in: > > /var/lib/heartbeat/pengine/pe-warn-388.bz2 > > Oct 16 11:40:53 wc01 pengine: [4617]: info: process_pe_message: > > Configuration WARNINGs found during PE processing. Please run "crm_verify > > -L" to identify issues. > > and crm_verify -L the pe-warn files, i get > > crm_verify[29970]: 2008/10/16_11:41:13 notice: StopRsc: wc01 Stop > > stonith_rackpdu:0 > > crm_verify[29970]: 2008/10/16_11:41:13 WARN: stage6: Scheduling Node wc02 > > for STONITH
I don't see what was the reason cluster wants to shoot wc02. The logs should say. > i recently changed my stonith configuration so the currently active > one is not working anymore. thats one reason i want to update my > cib.xml. > > in particular, stonith_rackpdu's hostlist must be updated to reflect > the change from "wc0X" to "wc0X-neu". Right. In general, if the environment changes in such a way that some resources may fail, then one should put those resources into unmanaged mode beforehand. Actually, it'd be best to first stop those resources, do whatever changes you have to do, reconfigure resources appropriately, then start them again. > anyways, is there any reason to avoid a shutdown and wait for stonith > to succeed? It depends, I guess. You can take a look at the pengine transitions. > is the stonith agent broken? Hard to say without logs, but probably not. > is pacemaker broken? is any > other part broken? is this behavior intended? Yes, insofar that if a node needs to be fenced, then it (the cluster) won't budge until that's done. Whether it is sensible to fence that node is another matter. If you find this behaviour unexpected, then please open a bugzilla and attach a full report (hb_report). Thanks, Dejan > cheers, > raoul > -- > ____________________________________________________________________ > DI (FH) Raoul Bhatia M.Sc. email. [EMAIL PROTECTED] > Technischer Leiter > > IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at > Barawitzkagasse 10/2/2/11 email. [EMAIL PROTECTED] > 1190 Wien tel. +43 1 3670030 > FN 277995t HG Wien fax. +43 1 3670030 15 > ____________________________________________________________________ > > _______________________________________________ > Pacemaker mailing list > [email protected] > http://list.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
