Re: [Pacemaker] RFC: What part of the XML configuration do you hate the most?

Andrew Beekhof Wed, 17 Sep 2008 01:09:46 -0700


On Sep 17, 2008, at 8:12 AM, Satomi TANIGUCHI wrote:

Hi Andrew,

Thank you for your opinion!

I have considered the way to modify each RA, of course.
But there are the following problems.

(1) Can't support the case that timeout occurs.
   Only lrmd can detect that timeout occurs. RA can't do this.
One of the main purpose of this function is to avoid unnecessaryF/O whensudden high load happened in very short time. In that case,monitor'stimeout may occur. So, RA is not suitable to be implemented thisfunction.
(2) A lack of generality.
In that way, it requires all RAs which an user needs to beimplemented
   this function. It is really troublesome.


Add a common function (or two) to .ocf-shellfuncs


(3) RA has to hold its information of failures outside of it.
   If you implement the same specification in RA, the information of
   a resource's failures has to be hold outside of RA.
   For example, in a file in /var/run/heartbeat/rsctmp.
   It causes unnecessary F/O.

(For example, when the file is deleted or modified by hand, orfailed to

    read/write, etc...)


Unnecessary but rare.
Remember the first rule of optimization: "Don't do it."
   http://en.wikipedia.org/wiki/Optimization_(computer_science)#Quotes

The fact is that the resource shouldn't be failing (or even worse,appearing to fail when they haven't) in the first place. Seriously.

I can't help but feel this is all a work-around for badly written RAsand/or overly aggressive timeouts. There's nothing wrong with settinglarge timeouts... if you set 1 hour and the op returns in 1 second,then we don't wait around doing nothing for the other 59 minutes and59 seconds.

But if you really really only want to report an error if N monitorsfail in M seconds (I still think this is crazy, but whatever), thensimply implement monitor_loop() which calls monitor() up to N timeslooking for $OCF_SUCCESS and add:


  <op id=... name="monitor_loop" timeout="M" interval=... />

instead of a regular monitor op. Or even in addition to a regularmonitor op with on_fail=ignore if you want.

So total of about 5 lines in .ocf-shellfuncs (since monitor_loop isgeneric) and 1 per RA that you want/need to support this. And becauseit doesn't modify the existing RA behavior or code paths, including itshouldn't be a problem.


And the lrmd still handles all the timeouts.


(4) Monitor may take long time.


We don't care.  Really.

Take as much time as you need to decide the resource is really deadand you want us to do something about it.

The LRMd doesn't even tell the CRM about every single monitor result -only when the RA returns something different to the last result.

I considered another specification. To check whether a resourceis runningor not several times (put in a sleep between them) in one monitorfunction,
   and to return NG only when all results of the checks are NG.

NG?


   (I think this specification is similar to yours...)


Almost.  No need for the sleep.
Just return after the first success.

   But then, a monitor function becomes to take long time,


Again, this isn't a bad thing.

You don't want the CRM to know about the failure until the resourcehas been dead for M seconds anyway, what does it matter if the monitorop takes M seconds to complete?

Besides, the monitor op will only take that long when the resourceisn't working properly.

and it influences the setting of monitor's timeout.


Not badly.

For the above reasons, I judged that it is better to implement thisfunction in lrmd.
Best Regards,
Satomi TANIGUCHI




Andrew Beekhof wrote:
Personally, I think that this is simply the wrong approach.
If a resource doesn't want the cluster to react to a failure, thenthe RA just shouldn't report one. Problem solved.
On Sep 11, 2008, at 9:06 AM, Satomi Taniguchi wrote:
Hi Lars,

Thank you for your reply.


Lars Marowsky-Bree wrote:
On 2008-09-09T18:37:31, Satomi Taniguchi <[EMAIL PROTECTED]> wrote:
[...snip...]
(2) lrmd counts the monitor op's failures of each resource perperiod-length.And it ignores the resource's failure until the number oftimes of that
   exceeds the threshold (max-failures-per-period).
This means that this policy is enforced by the LRM; I'm not surethat's
perfect. Should this not be handled by the PE?
At first, I also tried to implement this function in PE.
But there were some problems.
(1) PE has no way to clear fail-count.
When PE knows a resource's failure, the rsc's fail-count hasalready
  increased. So, it is proper to treat fail-count as the counter of
  failure for this new function, if it is implemented in PE.
But, for example, when the period is over, it needs to clear thefail-count.
  At present, PE has no way to request something to cib.
PE's role is to create a graph based on current CIB, not tochange it,
  as far as I understand.
  And users may be confused if fail-count is cleared suddenly.
(2) After a resource is failed once, even if it is failed again,lrmd doesn't
  notify crmd of the failure.
With new function, PE has to know the failure of resource evenif it occursconsecutively. But normally, the result of monitor operation isnotified
  only when it changes.
In addition, even if lrmd always notify crmd of the resource'sfailure,the rsc's fail-count doesn't increase because magic-numberdoesn't change.
  That is to say, PE can't detect consecutive failures.
I tried to the way to cancel the monitor operation of the failedresource
  and set the same op again.
  But in this way, new monitor operation is done immediately,
  then the interval of monitor operation becomes no longer constant.
So, I considered it is more proper to implement the new functionin lrmd.
(3) If the value of period-length is 0, lrmd calculates thesuitable length of
[...snip...]
In addition, I add the function to lrmadmin to show thefollowing information.i) the time when the period-length started of the specifiedresource.ii) the value of the counter of failures of the specifiedresource.
This is the third patch.
This means that the full cluster state is no longer reflected inthe
CIB. I don't really like that at all.
I see what you mean.
If it is possible, I want to gather all state of the cluster inthe CIB, too.For that purpose, I tried to implement this function in PE, atfirst.
But it seems _not_ to be possible for the above reasons...
+    op_type = ha_msg_value(op->msg, F_LRM_OP);
+
+    if (STRNCMP_CONST(op_type, "start") == 0) {
+        /* initialize the counter of failures. */
+        rsc->t_failed = 0;
+        rsc->failcnt_per_period = 0;
+    }
What about a resource being promoted to master state, or demotedagain?
Should the counter not be reset then too?
Exactly.
Thank you for your pointing out.
(The functions are also getting verrry long; maybe factor somecode out
into smaller functions?)
All right.
I will do so.
Regards,
  Lars
Best Regards,
Satomi TANIGUCHI

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker



_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] RFC: What part of the XML configuration do you hate the most?

Reply via email to