Hello, While setting this various parameters, I couldn't find documentation and details about them. Bellow some questions.
Considering the watchdog module used on a server is set up with a 30s timer (lets call it the wdt, the "watchdog timer"), how should "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be set? Here is my thinking so far: "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the wdt expire so the server stay alive. Online resources and default values are usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to reset the timer multiple times (eg. because of excessive load, swap storm etc)? The server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right? "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after it asked for a node fencing before it considers the watchdog was actually triggered and the node reseted, even with no confirmation? I suppose "stonith-watchdog-timeout" is mostly useful to stonithd, right? "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action timeout should be at least greater than the wdt so stonithd will not raise a timeout before the wdt had a chance to exprire and reset the node. Is it right? Any other comments? Regards, -- Jehan-Guillaume de Rorthais Dalibo _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
