Great, thanks! On Wednesday 05 November 2008 16:46:59 Andrew Beekhof wrote: > On second thoughts - patch applied. > Having two levels of defaults makes no sense. > > On Wed, Nov 5, 2008 at 16:36, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > On Nov 5, 2008, at 3:11 PM, Bernd Schubert wrote: > >>> at the cluster summit in prague we also agreed on a "black box" > >>> recorder that should help too. > >>> this way we can log tracing details there and only dump it into the > >>> logs (or recover it from core files) when needed. > >>> > >>> but this will live in corosync, so it wont help people running on > >>> heartbeat. > >> > >> Well, if openais + corosysnc are better, we can try to switch to it. > > > > note the future tense there though... its not implemented yet. > > > >>>> Then after I found the code in pacemaker, I already tested setting > >>>> dc_deatime, > >>>> but during my initial test that didn't change anything. While we > >>>> need for > >>>> Lustre installations a heartbeat deadtime > 10min, I set it on my test > >>>> systems to 180s. > >>>> Now after your suggestion I tested it again, with deadtime=20min, but > >>>> dc_deatime=10s and quite odd, crm still needs about 3min to set the > >>>> nodes > >>>> online (syslog attached). With the code removed it is only 10s. > >>> > >>> Hmmm - thats odd - i'll take a look. > >> > >> Thanks, I will also try to find some time to look at it again. > >> > >>>> Since openais doesn't seem to support the code below at at all and > >>>> since it is > >>>> wrong when used together with heartbeat, I still think removing > >>>> these lines > >>>> is right. Please correct me if I'm wrong. > >>> > >>> I'd prefer to fix the logic (if it's broken) since it's likely that > >>> we'd add an equivalent default mechanism for CoroSync eventually. > >> > >> I just don't understand why we need that mechanism at all. I mean if > >> heartbeat/corosync/openais detect everything > > > > Especially with autojoin, it doesn't know that "everything" is online. > > There could be some extra nodes about to start/join the cluster. > > > > Remember, this is only supposed to supply a default value. > > Advanced users are free to set it as low as they like. > > > > Of course they need to know they can - thats a documentation issue which > > can be easily rectified. > > > >> is online, why does pacemaker need its own start timeout again? > > > > because it needs to give any existing DC a chance to contact it rather > > than needlessly causing another DC election. > > > >> Shouldn't it try to online everything as > >> soon as it is started? Well, ok it needs a timeout to detect if other > >> nodes > >> already have a DC. > > > > exactly. so any value should only be used by the first node to come up. > > is that what you're seeing? > > > >> But then the DC detection timeout is not related at all to > >> node deadtime detection, is it? > > > > at the time it was felt that they were related enough that it made the > > basis of a good default.
-- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
