Hello, earlier this year complained on the heartbeat mailing list about huge startup times, when deadtime is large (due to initdead >= deadtime):
http://www.mail-archive.com/linux-ha%40lists.linux-ha.org/msg07801.html Finally I found the time to look more detailed into this issue. It is rather easy to convince heartbeat it is to go online, basically just a removal in check_comm_isup() of this condition: if (config->rtjoinconfig != HB_JOIN_NONE && !init_deadtime_passed){ return; } But then the trouble is with crm, it still refuses to select any of the nodes as domain controller and so nothing will go online after a system wide heartbeat shutdown. The reason is quite simple, crm uses a simple timer to the initial selection. As timeout it then uses getenv(ENV_PREFIX "initdead") set by heartbeat. See the setting and usage of election_trigger->period_ms in do_startup(), config_query_callback and config_query_callback(). IMHO using such a simple timer is plain wrong. Actually heartbeat should tell crm when all cluster nodes have been found and then immediately the DC should be selected. Well, actually we could keep the timer, but additionally also would need to get informed by heartbeat when all cluster nodes are already online. Then the timer could be stopped and the DC selection could be done immediately. Is there already a callback from heartbeat when all nodes are onlined? Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
