> > > > }
> > > > handlers {
> > > > fence-peer "/usr/lib/drbd/rhcs_fence";
> > > > }
> > > > }
> > > >
> > > >
> > > rhcs_fence is wrong fence-peer utility. You should use
> > > /usr/lib/drbd/crm-fence-peer.sh and
> > > /usr/lib/drbd/crm-unfence-peer.sh instead.
> >
> > But my understanging (probably wrong) was that the fence-peer handler is
> > meant to be called for STONITH, not for "usual" promotions/demotions
> > to/from Primary/Secondary.
> >
> > If I use the aforementioned pair of handlers (crm-*.sh) for
> > fence/unfence, do I still get STONITH behavior for "split brain cases"?
> >
>
> Correct. The 'rhcs_fence' handler passes fence calls on to cman, which
> you have set to redirect on to pacemaker. This isn't what it was
> designed for, and hasn't been tested. It was meant to be an updated
> replacement for obliterate-peer.sh in cman+rgmanager clusters directly
> (no pacemaker).
Well, since it is a CMAN cluster after all and rhcs_fence relies only (besides
/proc/drbd) on cman_tool and fence_node (which should be correctly working), I
thought it would be the correct fence script choice, but I will obviously
accept your suggestion and use the crm-* scripts instead.
Anyway, I'm afraid that the real problem lurks elsewhere, since, as I stated
before, a simple master/slave promotion/demotion should not lead to fencing, I
suppose.
As suggested by Nikita Staroverov
, I pasted relevant (I hope) excerpts from logs on
first node (the one surviving the stonith) at the time of one "stonith fest" :)
just after committing a CIB update with new resources.
http://pastebin.com/0eQqsftb
I can recall that seconds before being shot, the second node "lost contact"
with cluster (I was issuing "pcs status" and "crm_mon -Arf1" from an SSH
session and suddenly it went "cluster not connected" or something like that).
Maybe (apart from the aforementioned improper use of rhcs_fence) there are
issues with some timeout settings on cluster/DRBD operations and almost
certainly the nodes have problems with their clock (still finding a
reasonable/reachable NTP source), but I do not know if these can be relevant
issues.
Many thanks again for your suggestions.
Regards,
Giuseppe
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org