Hello Jan, Thanks very much for your help :), i will try to read the patches you posted
Emmanuel 2014-05-05 16:14 GMT+02:00 Jan Friesse <[email protected]>: > Emmanuel, > > emmanuel segura napsal(a): > > Helllo Jan, > > > > I'm using corosync+pacemaker on Sles 11 Sp1 and this is a critical > system, > > Oh, ok. > > > i don't think i'll get the authorization for upgrade system, but i would > > like to know if there is any bug about this issue in my current corosync > > release. > > This is hard to say. Suse guys probably included many patches, so it > would make sense to try to contact Suse support. > > After very very quick look to git, following patches may be related: > 559d4083ed8355fe83f275e53b9c8f52a91694b2, > 02c5dffa5bb8579c223006fa1587de9ba7409a3d, > 64d0e5ace025cc929e42896c5d6beb3ef75b8244, > 6fae42ba72006941c1fde99616ea30f4f10ebb38, > c7e686181bcd0e975b09725502bef02c7d0c338a. > > But still keep in mind that between latest 1.3.6 (what I believe is more > or less what you are using) and current origin/flatiron are 118 patches... > > Regards, > Honza > > > > > Thanks > > Emmanuel > > > > > > 2014-04-30 17:07 GMT+02:00 Jan Friesse <[email protected]>: > > > >> Emmanuel, > >> > >> emmanuel segura napsal(a): > >>> Hello Jan, > >>> > >>> Thanks for the explanation, but i saw this in my log. > >>> > >>> > >> > :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>> > >>> corosync [TOTEM ] Process pause detected for 577 ms, flushing > membership > >>> messages. > >>> corosync [TOTEM ] Process pause detected for 538 ms, flushing > membership > >>> messages. > >>> corosync [TOTEM ] A processor failed, forming new configuration. > >>> corosync [CLM ] CLM CONFIGURATION CHANGE > >>> corosync [CLM ] New Configuration: > >>> corosync [CLM ] r(0) ip(10.xxx.xxx.xxx) > >>> corosync [CLM ] Members Left: > >>> corosync [CLM ] r(0) ip(10.xxx.xxx.xxx) > >>> corosync [CLM ] Members Joined: > >>> corosync [pcmk ] notice: pcmk_peer_update: Transitional membership > event > >>> on ring 6904: memb=1, new=0, lost=1 > >>> corosync [pcmk ] info: pcmk_peer_update: memb: node01 891257354 > >>> corosync [pcmk ] info: pcmk_peer_update: lost: node02 874480 > >>> > >>> > >> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>> > >>> when this happen, corosync needs to retransmit the toten? > >>> from what i understood the toten need to be retransmit, but in my case > a > >>> new configuration was formed > >>> > >>> This my corosync version > >>> > >>> corosync-1.3.3-0.3.1 > >>> > >> > >> 1.3.3 is unsupported for ages. Please upgrade to newest 1.4.6 (if you > >> are using cman) or 2.3.3 (if you are not using cman). Also please change > >> your pacemaker to not use plugin (upgrade to 2.3.3 will solve it > >> automatically, because plugins in corosync 2.x are no longer support). > >> > >> Regards, > >> Honza > >> > >> > >>> Thanks > >>> > >>> > >>> 2014-04-30 9:42 GMT+02:00 Jan Friesse <[email protected]>: > >>> > >>>> Emmanuel, > >>>> there is no need to trigger fencing on "Process pause detected...". > >>>> > >>>> Also fencing is not triggered if membership didn't changed. So let's > say > >>>> token was lost but during gather state all nodes replied, then there > is > >>>> no change of membership and no need to fence. > >>>> > >>>> I believe your situation was: > >>>> - one node is little overloaded > >>>> - token lost > >>>> - overload over > >>>> - gather state > >>>> - every node is alive > >>>> -> no fencing > >>>> > >>>> Regards, > >>>> Honza > >>>> > >>>> emmanuel segura napsal(a): > >>>>> Hello Jan, > >>>>> > >>>>> Forget the last mail: > >>>>> > >>>>> Hello Jan, > >>>>> > >>>>> I found this problem in two hp blade system and the strange thing is > >> the > >>>>> fencing was not triggered :(, but it's enabled > >>>>> > >>>>> > >>>>> 2014-04-25 18:36 GMT+02:00 emmanuel segura <[email protected]>: > >>>>> > >>>>>> Hello Jan, > >>>>>> > >>>>>> I found this problem in two hp blade system and the strange thing is > >> the > >>>>>> fencing was triggered :( > >>>>>> > >>>>>> > >>>>>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <[email protected]>: > >>>>>> > >>>>>> Emanuel, > >>>>>>> > >>>>>>> emmanuel segura napsal(a): > >>>>>>> > >>>>>>> Hello List, > >>>>>>>> > >>>>>>>> I have this two lines in my cluster logs, somebody can help to > know > >>>> what > >>>>>>>> this means. > >>>>>>>> > >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>>>>>>> :::::::::::::: > >>>>>>>> > >>>>>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing > >>>> membership > >>>>>>>> messages. > >>>>>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing > >>>> membership > >>>>>>>> messages. > >>>>>>>> > >>>>>>> > >>>>>>> Corosync internally checks gap between member join messages. If > such > >>>> gap > >>>>>>> is > token/2, it means, that corosync was not scheduled to run by > >>>> kernel > >>>>>>> for too long, and it should discard membership messages. > >>>>>>> > >>>>>>> Original intend was to detect paused process. If pause is detected, > >>>> it's > >>>>>>> better to discard old membership messages and initiate new query > then > >>>>>>> sending outdated view. > >>>>>>> > >>>>>>> So there are various reasons why this is triggered, but today it's > >>>>>>> usually VM with overloaded host machine. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> corosync [TOTEM ] A processor failed, forming new configuration. > >>>>>>>> > >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>>>>>>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: > >>>>>>>> :::::::::::::: > >>>>>>>> > >>>>>>>> I know the "corosync [TOTEM ] A processor failed, forming new > >>>>>>>> configuration" message is when the toten package is definitely > lost. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> > >>>>>>> Regards, > >>>>>>> Honza > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Pacemaker mailing list: [email protected] > >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>>>>> > >>>>>>>> Project Home: http://www.clusterlabs.org > >>>>>>>> Getting started: > >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>>>>> Bugs: http://bugs.clusterlabs.org > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Pacemaker mailing list: [email protected] > >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>>>> > >>>>>>> Project Home: http://www.clusterlabs.org > >>>>>>> Getting started: > >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>>>> Bugs: http://bugs.clusterlabs.org > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> esta es mi vida e me la vivo hasta que dios quiera > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Pacemaker mailing list: [email protected] > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>> > >>>>> Project Home: http://www.clusterlabs.org > >>>>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>> Bugs: http://bugs.clusterlabs.org > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Pacemaker mailing list: [email protected] > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: http://bugs.clusterlabs.org > >>>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: [email protected] > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: [email protected] > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > > > > > > > > > > > > _______________________________________________ > > Pacemaker mailing list: [email protected] > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
