Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

Andrew Beekhof Thu, 10 Jul 2014 16:45:26 -0700

On 10 Jul 2014, at 10:59 am, Giuseppe Ragusa <[email protected]> 
wrote:


> On Thu, Jul 10, 2014, at 00:00, Andrew Beekhof wrote:
>> 
>> On 9 Jul 2014, at 10:43 pm, Giuseppe Ragusa <[email protected]> 
>> wrote:
>> 
>>> On Tue, Jul 8, 2014, at 06:06, Andrew Beekhof wrote:
>>>> 
>>>> On 5 Jul 2014, at 1:00 am, Giuseppe Ragusa <[email protected]> 
>>>> wrote:
>>>> 
>>>>> From: [email protected]
>>>>> Date: Fri, 4 Jul 2014 22:50:28 +1000
>>>>> To: [email protected]
>>>>> Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require  
>>>>> --force to be added to levels
>>>>> 
>>>>> 
>>>>> On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>>>> Hi all,
>>>>>>>> while creating a cloned stonith resource
>>>>>>> 
>>>>>>> Any particular reason you feel the need to clone it?
>>>>>> 
>>>>>> In the end, I suppose it's only a "purist mindset" :) because it is a 
>>>>>> PDU whose power outlets control both nodes, so
>>>>>> its resource "should be" active (and monitored) on both nodes 
>>>>>> "independently".
>>>>>> I understand that it would work anyway, leaving it not cloned and not 
>>>>>> location-constrained
>>>>>> just as regular, "dedicated" stonith devices would not need to be 
>>>>>> location-constrained, right?
>>>>>> 
>>>>>>>> for multi-level STONITH on a fully-up-to-date CentOS 6.5 
>>>>>>>> (pacemaker-1.1.10-14.el6_5.3.x86_64):
>>>>>>>> 
>>>>>>>> pcs cluster cib stonith_cfg
>>>>>>>> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
>>>>>>>>   ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \   
>>>>>>>>  
>>>>>>>> pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7"
>>>>>>>>  \
>>>>>>>>   pcmk_host_check="static-list" 
>>>>>>>> pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan"
>>>>>>>>  op monitor interval="240s"
>>>>>>>> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
>>>>>>>> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan 
>>>>>>>> pdu1Clone
>>>>>>>> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan 
>>>>>>>> pdu1Clone
>>>>>>>> 
>>>>>>>> 
>>>>>>>> the last 2 lines do not succeed unless I add the option "--force" and 
>>>>>>>> even so I still get errors when issuing verify:
>>>>>>>> 
>>>>>>>> [root@cluster1 ~]# pcs stonith level verify
>>>>>>>> Error: pdu1Clone is not a stonith id
>>>>>>> 
>>>>>>> If you check, I think you'll find there is no such resource as 
>>>>>>> 'pdu1Clone'.
>>>>>>> I don't believe pcs lets you decide what the clone name is.
>>>>>> 
>>>>>> You're right! (obviously ;> )
>>>>>> It's been automatically named pdu1-clone
>>>>>> 
>>>>>> I suppose that there's still too much crmsh in my memory :)
>>>>>> 
>>>>>> Anyway, removing the stonith level (to start from scratch) and using the 
>>>>>> correct clone name does not change the result:
>>>>>> 
>>>>>> [root@cluster1 etc]# pcs -f stonith_cfg stonith level add 2 
>>>>>> cluster1.verolengo.privatelan pdu1-clone
>>>>>> Error: pdu1-clone is not a stonith id (use --force to override)
>>>>> 
>>>>> I bet we didn't think of that.
>>>>> What if you just do:
>>>>> 
>>>>>  pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1
>>>>> 
>>>>> Does that work?
>>>>> 
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> Yes, no errors at all and verify successful.
>>> 
>>> This initially passed by as a simple check for general sanity, while now, 
>>> on second read, I think you were suggesting that I could clone as usual 
>>> then configure with the primitive resource (which I usually avoid when 
>>> working with regular clones) and it should automatically use instead the 
>>> clone "at runtime", correct?
>> 
>> right. but also consider not cloning it at all :)
> 
> I understand that in your opinion there's almost no added value to cloned 
> stonith resources, so I suppose that should a PDU-type resource happen to be 
> running on the same node that it must now fence, it would be migrated first 
> or something like that (since I understand that stonith resources cannot 
> fence the node they are running on), right?

Nope. There is no requirement that the fencing resource first be running a) 
anywhere or b) on a different node in order for a node to be fenced.
Where-ever possible we will try to avoid having a node fence itself, but this 
is unrelated to where the fencing resource is running. 

> If it is so and there's no adverse effect whatsoever (not even a significant 
> delay), I will promptly remove the clone and configure my second levels using 
> the primitive PDU stonith resource, but if on the contrary you after all 
> think that there could be some "legitimate" use for such clones, I could open 
> an RFE in bugzilla for them to be recognized as stonith resources and used in 
> forming levels (if you suggest so).
> 
> Anyway,  many thanks for you advice and insight, obviously :)
> 
>>>>> Remember that a full real test (to verify actual second level 
>>>>> functionality in presence of first level failure)
>>>>> is still pending for both the plain and cloned setup.
>>>>> 
>>>>> Apropos: I read through the list archives that stonith resources (being 
>>>>> resources, after all)
>>>>> could themselves cause fencing (!) if failing (start, monitor, stop)
>>>> 
>>>> stop just unsets a flag in stonithd.
>>>> start does perform a monitor op though, which could fail.
>>>> 
>>>> but by default only stop failure would result in fencing.
>>> 
>>> I though that start-failure-is-fatal was true by default, but maybe not for 
>>> stonith resources.
>> 
>> fatal in the sense of "won't attempt to run it there again", not the "fence 
>> the whole node" way
> 
> Ah right, I remember now all the suggestions I found about 
> migration-threshold, failure-timeout and the cluster-recheck-interval... 
> sorry for the confusion and thank you for pointing it out!
> 
> Regards,
> Giuseppe
> 
>>>>> and that an ad-hoc
>>>>> on-fail setting could be used to prevent that.
>>>>> Maybe my aforementioned naive testing procedure (pull the iLO cable) 
>>>>> could provoke that?
>>>> 
>>>> _shouldnt_ do so
>>>> 
>>>>> Would you suggest to configure such an on-fail option?
>>>> 
>>>> again, shouldn't be necessary
>>> 
>>> Thanks again.
>>> 
>>> Regards,
>>> Giuseppe
>>> 
>>>>> Many thanks again for your help (and all your valuable work, of course!).
>>>>> 
>>>>> Regards,
>>>>> Giuseppe
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: [email protected]
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: [email protected]
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> Email had 1 attachment:
>>>> + signature.asc
>>>> 1k (application/pgp-signature)
>>> -- 
>>> Giuseppe Ragusa
>>> [email protected]
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: [email protected]
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: [email protected]
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> Email had 1 attachment:
>> + signature.asc
>>  1k (application/pgp-signature)
> -- 
>  Giuseppe Ragusa
>  [email protected]
> 
> 
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

Reply via email to