Hi Satomi-san, On Tue, Oct 21, 2008 at 05:35:03PM +0900, Satomi TANIGUCHI wrote: > Hi Dejan, > > > Dejan Muhamedagic wrote: >> Hi Satomi-san, >> >> On Thu, Oct 16, 2008 at 03:43:36PM +0900, Satomi TANIGUCHI wrote: >>> Hi Dejan, >>> >>> >>> Dejan Muhamedagic wrote: >>>> Hi Satomi-san, >>>> >>>> On Tue, Oct 14, 2008 at 07:07:00PM +0900, Satomi TANIGUCHI wrote: >>>>> Hi, >>>>> >>>>> I found that there are 2 problems when DC node is STONITH'ed. >>>>> (1) STONITH operation is executed two times. >>>> This has been discussed at length in bugzilla, see >>>> >>>> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1904 >>>> >>>> which was resolved with WONTFIX. In short, it was deemed to risky >>>> to implement a remedy for this problem. Of course, if you think >>>> you can add more to the discussion, please go ahead. >>> Sorry, I missed it. >> >> Well, you couldn't have known about it :) >> >>> Thank you for your pointing! >>> I understand how it came about. >>> >>> Ideally, when DC-node is going to be STONITH'ed, >>> the new DC-node is elected and it STONITHs the ex-DC, >>> then these problems will not occur. >>> But maybe it is not good way from the viewpoint of emergency >>> because the ex-DC should be STONITH'ed as soon as possible. >> >> Yes, you're right about this. >> >>> Anyway, I understand this is an expected behavior, thanks! >>> But then, it seems that tengine has to keep having a timeout for waiting >>> stonithd's result, and long cluster-delay is still required. >> >> If I understood Andrew correctly, the tengine will wait forever, >> until stonithd sends a message. Or dies which, let's hope, won't >> happen. > My perception is the same as you. > >> >>> Because second STONITH is requested on that transition timeout. >>> I'm afraid that I misunderstood the true meaning of what Andrew said. >> >> In the bugzilla? If so, please reopen and voice your concerns. > I asked him again in bugzilla, thanks! > >> >>>>> (2) Timeout-value which stonithd on DC node waits to reply >>>>> the result of STONITH op from other node is >>>>> always set to "stonith-timeout" in <cluster_property_set>. >>>>> [...] >>>>> The case (2): >>>>> When this timeout occurs on stonithd on DC >>>>> during non-DC node's stonithd tries to reset DC, >>>>> DC-stonithd will send a request to other node, >>>>> and two or more STONITH plugins are executed in parallel. >>>>> This is a troublesome problem. >>>>> The most suitable value as this timeout might be >>>>> the sum total of "stonith-timeout" of STONITH plugins on the node >>>>> which is going to receive the STONITH request from DC node, I think. >>>> This would probably be very difficult for the CRM to get. >>> Right, I agree with you. >>> I meant "it is difficult because stonithd on DC can't know the values of >>> stonith-timeout on other node." with the following sentence >>> "But DC node can't know that...". >>>>> But DC node can't know that... >>>>> I would like to hear your opinions. >>>> Sorry, but I couldn't exactly follow. Could you please describe >>>> it in terms of actions. >>> Sorry, I restate what I meant. >>> The timeout which stonithd on DC waits for the return of other node's >>> stonithd needs the value that is longer than the sum total of >>> "stonith-timeout" >>> of STONITH plugins on the node by all rights. >>> But it is so difficult to get the values for DC-stonithd. >>> Then I would like to hear your opinion about what is suitable and practical >>> value as this timeout which is set in insert_into_executing_queue(). >>> I hope I conveyed to you what I want to say. >> >> OK, I suppose I understand now. You're talking about the timeouts >> for remote fencing operations, right? And the originating > Exactly! > >> stonithd hasn't got a clue on how long the remote fencing >> operation may take. Well, that could be a problem. I can't think >> of anything to resolve that completely, not without "rewiring" >> stonithd. stonithd broadcasts the request so there's no way for >> it to know who's doing what and when and how long it can take. >> >> The only workaround I can think of is to use the global (cluster >> property) stonith-timeout which should be set to the maximum sum >> of stonith timeouts for a node. > All right. > I misunderstood the role of the global stonith-timeout. > I considered it just the default value for each plugin's stonith-timeout > as if default-action-timeout is for each operation. > To use stonith-timeout correctly (without troublesome timeouts), > we should keep the following, right? > - set stonith-timeout for every STONITH plugin. > - set the global stonith-timeout to the maximum sum of stonith timeouts > for a node. > - (set cluster-delay to longer than global stonith-timeout, > at least at present.)
Right. >> Now, back to reality ;-) Timeouts are important, of course, but >> one should usually leave a generous margin on top of the expected >> duration. For instance, if the normal timeout for an operation on >> a device is 30 seconds, there's nothing wrong in setting it to >> say one or two minutes. The consequences of an operation ending >> prematurely are much more serious than if one waits a bit longer. >> After all, if there's something really wrong, it is usually >> detected early and the error reported immediately. Of course, >> one shouldn't follow this advice blindly. Know your cluster! > Understood! > >> >>> For reference, I attached logs when the aforesaid timeout occurs. >>> The cluster has 3 nodes. >>> When DC was going to be STONITH'ed, DC sent a request all of non-DC nodes, >>> and all of them tried to shutdown DC. >> >> No, the tengine (running on DC) always talks to the local >> stonithd. > I meant "stontihd on DC broadcast a request" > with the sentence "DC sent a request all of non-DC nodes". > I'm sorry for being ambiguous. > >> >>> And the timeout on DC-stonithd occured, DC-stonithd sent the same request, >>> then two or more STONITH plugin worked in parallel on every non-DC node. >>> (Please see sysstats.txt.) >>> >>> I want to make clear whether the current behavior is expected or a bug. >> >> That's actually wrong, but could be considered a configuration >> problem: >> >> <cluster_property_set id="cib-bootstrap-options"> >> ... >> <nvpair id="nvpair.id2000009" name="stonith-timeout" value="260s"/> >> ... >> <primitive id="prmStonithN1" class="stonith" type="external/ssh"> >> ... >> <nvpair id="nvpair.id2000602" name="stonith-timeout" value="390s"/> >> >> The stonithd initiator (the one running on the DC) times out >> before the remote fencing operation. On retry a second remote >> fencing operation is started. That's why you see two of them. > I set these values because I wanted to know what would be happen > when the timeout for remote fencing op occurs, and I intended to talk about it > on MailingList if curious behavior appears. ;) > >> >> Anyway, you can open a bugzilla for this, because the stonithd on >> a remote host should know that there's already one operation >> running. Unfortunately, I'm busy with more urgent matters right >> now, so it may take a few weeks until I take a look at it. >> As usual, patches are welcome :) > I posted into bugzilla. > http://developerbugs.linux-foundation.org/show_bug.cgi?id=1983 > I'm sorry to bother you. Thanks for filing this. Cheers, Dejan > Best Regards, > SatomiTANIGUCHI > > >> >> Thanks, >> >> Dejan >> >>> But I consider that the root of every problem is the node which sends >>> STONITH >>> request and wait for completion of the op is killed. >>> >>> >>> Regards, >>> Satomi TANIGUCHI >>> >>> >>>> Thanks, >>>> >>>> Dejan >>>> >>>>> Best Regards, >>>>> Satomi TANIGUCHI >>>> >>>>> _______________________________________________________ >>>>> Linux-HA-Dev: [EMAIL PROTECTED] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >>>>> Home Page: http://linux-ha.org/ >>>> _______________________________________________________ >>>> Linux-HA-Dev: [EMAIL PROTECTED] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >>>> Home Page: http://linux-ha.org/ >>> >>> >>> >>> >> >> >>> _______________________________________________ >>> Pacemaker mailing list >>> [email protected] >>> http://list.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> _______________________________________________ >> Pacemaker mailing list >> [email protected] >> http://list.clusterlabs.org/mailman/listinfo/pacemaker > > > _______________________________________________ > Pacemaker mailing list > [email protected] > http://list.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
