Probably you need to enable_startup_fencing = 0 instead of enable_fencing = 0.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: [email protected]

www.feldhost.cz - FeldHost™ – Hostingové služby přizpůsobíme Vám Máte 
specifické požadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V Chotejně 765/15
Praha 10 – Hostivař, PSČ 102 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010 0000 0024 0033 0446

> On 1 Oct 2018, at 18:55, Patrick Whitney <[email protected]> wrote:
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't.
> 
> As a matter of fact, DLM has a setting "enable_fencing=0|1" for what that's 
> worth.   
>  
> You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> We do have fencing enabled in the cluster; we've tested both node level 
> fencing and resource fencing; DLM behaved identically in both scenarios, 
> until we set it to 'enable_fencing=0' in the dlm.conf file. 
>  
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> This isn't quite what I was seeing in the logs.  The "failed" node would be 
> fenced off, pacemaker appeared to be sane, reporting services running on the 
> running nodes, but once the failed node was seen as missing by dlm 
> (dlm_controld), dlm would request fencing, from what I can tell by the log 
> entry.  Here is an example of the suspect log entry:
> Sep 26 09:41:35 pcmk-test-1 dlm_controld[837]: 38 fence request 2 pid 1446 
> startup time 1537969264 fence_all dlm_stonith
>  
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> Can you speak more to what "proper fencing" is for DLM? 
> 
> Best,
> -Pat
> 
>   
> 
> On Mon, Oct 1, 2018 at 12:30 PM Digimer <[email protected] 
> <mailto:[email protected]>> wrote:
> On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
> > Patrick Whitney <[email protected] <mailto:[email protected]>> 
> > writes:
> > 
> >> I have a two node (test) cluster running corosync/pacemaker with DLM
> >> and CLVM.
> >>
> >> I was running into an issue where when one node failed, the remaining node
> >> would appear to do the right thing, from the pcmk perspective, that is.
> >> It would  create a new cluster (of one) and fence the other node, but
> >> then, rather surprisingly, DLM would see the other node offline, and it
> >> would go offline itself, abandoning the lockspace.
> >>
> >> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> >> our tests are now working as expected.
> > 
> > I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
> > is, they are started by systemd, not by Pacemaker).  I've seen weird DLM
> > fencing behavior, but not what you describe above (though I ran with
> > more than two nodes from the very start).  Actually, I don't even
> > understand how it occured to you to disable DLM fencing to fix that...
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't. You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> 
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> >> I'm a little concern I have masked an issue by doing this, as in all
> >> of the tutorials and docs I've read, there is no mention of having to
> >> configure DLM whatsoever.
> > 
> > Unfortunately it's very hard to come by any reliable info about DLM.  I
> > had a couple of enlightening exchanges with David Teigland (its primary
> > author) on this list, he is very helpful indeed, but I'm still very far
> > from having a working understanding of it.
> > 
> > But I've been running with --enable_fencing=0 for years without issues,
> > leaving all fencing to Pacemaker.  Note that manual cLVM operations are
> > the only users of DLM here, so delayed fencing does not cause any
> > problems, the cluster services do not depend on DLM being operational (I
> > mean it can stay frozen for several days -- as it happened in a couple
> > of pathological cases).  GFS2 would be a very different thing, I guess.
> > 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/ <https://alteeve.com/w/>
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
> 
> 
> -- 
> Patrick Whitney
> DevOps Engineer -- Tools
> _______________________________________________
> Users mailing list: [email protected]
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to