On second thoughts - patch applied. Having two levels of defaults makes no sense.
On Wed, Nov 5, 2008 at 16:36, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > On Nov 5, 2008, at 3:11 PM, Bernd Schubert wrote: > >>> >>> at the cluster summit in prague we also agreed on a "black box" >>> recorder that should help too. >>> this way we can log tracing details there and only dump it into the >>> logs (or recover it from core files) when needed. >>> >>> but this will live in corosync, so it wont help people running on >>> heartbeat. >> >> Well, if openais + corosysnc are better, we can try to switch to it. > > note the future tense there though... its not implemented yet. > >> >> >>> >>>> Then after I found the code in pacemaker, I already tested setting >>>> dc_deatime, >>>> but during my initial test that didn't change anything. While we >>>> need for >>>> Lustre installations a heartbeat deadtime > 10min, I set it on my test >>>> systems to 180s. >>>> Now after your suggestion I tested it again, with deadtime=20min, but >>>> dc_deatime=10s and quite odd, crm still needs about 3min to set the >>>> nodes >>>> online (syslog attached). With the code removed it is only 10s. >>> >>> Hmmm - thats odd - i'll take a look. >> >> Thanks, I will also try to find some time to look at it again. >> >>> >>>> Since openais doesn't seem to support the code below at at all and >>>> since it is >>>> wrong when used together with heartbeat, I still think removing >>>> these lines >>>> is right. Please correct me if I'm wrong. >>> >>> I'd prefer to fix the logic (if it's broken) since it's likely that >>> we'd add an equivalent default mechanism for CoroSync eventually. >> >> I just don't understand why we need that mechanism at all. I mean if >> heartbeat/corosync/openais detect everything > > Especially with autojoin, it doesn't know that "everything" is online. > There could be some extra nodes about to start/join the cluster. > > Remember, this is only supposed to supply a default value. > Advanced users are free to set it as low as they like. > > Of course they need to know they can - thats a documentation issue which can > be easily rectified. > >> is online, why does pacemaker need its own start timeout again? > > because it needs to give any existing DC a chance to contact it rather than > needlessly causing another DC election. > >> Shouldn't it try to online everything as >> soon as it is started? Well, ok it needs a timeout to detect if other >> nodes >> already have a DC. > > exactly. so any value should only be used by the first node to come up. > is that what you're seeing? > >> But then the DC detection timeout is not related at all to >> node deadtime detection, is it? > > at the time it was felt that they were related enough that it made the basis > of a good default. > > _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
