On Thu, 8 Dec 2016 11:47:20 +0100 Jehan-Guillaume de Rorthais <[email protected]> wrote:
> Hello, > > While setting this various parameters, I couldn't find documentation and > details about them. Bellow some questions. > > Considering the watchdog module used on a server is set up with a 30s timer > (lets call it the wdt, the "watchdog timer"), how should > "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be > set? > > Here is my thinking so far: > > "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the > wdt expire so the server stay alive. Online resources and default values are > usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to > reset the timer multiple times (eg. because of excessive load, swap storm > etc)? The server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, > right? > > "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is > stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after > it asked for a node fencing before it considers the watchdog was actually > triggered and the node reseted, even with no confirmation? I suppose > "stonith-watchdog-timeout" is mostly useful to stonithd, right? > > "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action > timeout should be at least greater than the wdt so stonithd will not raise a > timeout before the wdt had a chance to exprire and reset the node. Is it > right? Anyone on these questions? I am currently writing some more doc/cookbook for the PAF project[1], I would prefer being sure of what is written there :) [1] http://dalibo.github.io/PAF/documentation.html Regards, _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
