Re: [ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

Anu Pillai Fri, 19 May 2017 02:19:14 -0700

Hi Ken,

Did you get any chance to go through the logs?
Do you need any more details ?


Regards,
Aswathi

On Tue, May 16, 2017 at 3:04 PM, Anu Pillai <[email protected]>
wrote:

> Hi,
>
> Please find attached debug logs for the stated problem as well as crm_mon
> command outputs.
> In this case we are trying to remove/delete res3 and system/node (
> 0005B94238BC) from the cluster.
>
> *Test reproduction steps*
>
> Current Configuration of the cluster:
>  0005B9423910  - res2
>  0005B9427C5A - res1
>  0005B94238BC - res3
>
> *crm_mon output:*
>
> Defaulting to one-shot mode
> You need to have curses available at compile time to enable console mode
> Stack: corosync
> Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with quorum
> Last updated: Tue May 16 12:21:23 2017          Last change: Tue May 16
> 12:13:40 2017 by root via crm_attribute on 0005B9423910
>
> 3 nodes and 3 resources configured
>
> Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
>
>  res2   (ocf::redundancy:RedundancyRA): Started 0005B9423910
>  res1   (ocf::redundancy:RedundancyRA): Started 0005B9427C5A
>  res3   (ocf::redundancy:RedundancyRA): Started 0005B94238BC
>
>
> Trigger the delete operation for res3 and node 0005B94238BC.
>
> Following commands applied from node 0005B94238BC
> $ pcs resource delete res3 --force
> $ crm_resource -C res3
> $ pcs cluster stop --force
>
> Following command applied from DC(0005B9423910)
> $ crm_node -R 0005B94238BC --force
>
>
> *crm_mon output:*
>
> Defaulting to one-shot mode
> You need to have curses available at compile time to enable console mode
> Stack: corosync
> Current DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with quorum
> Last updated: Tue May 16 12:21:27 2017          Last change: Tue May 16
> 12:21:26 2017 by root via cibadmin on 0005B94238BC
>
> 3 nodes and 2 resources configured
>
> Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ]
>
>
> Observation is remaining two resources res2 and res1 were stopped and
> started.
>
>
> Regards,
> Aswathi
>
> On Mon, May 15, 2017 at 8:11 PM, Ken Gaillot <[email protected]> wrote:
>
>> On 05/15/2017 06:59 AM, Klaus Wenninger wrote:
>> > On 05/15/2017 12:25 PM, Anu Pillai wrote:
>> >> Hi Klaus,
>> >>
>> >> Please find attached cib.xml as well as corosync.conf.
>>
>> Maybe you're only setting this while testing, but having
>> stonith-enabled=false and no-quorum-policy=ignore is highly dangerous in
>> any kind of network split.
>>
>> FYI, default-action-timeout is deprecated in favor of setting a timeout
>> in op_defaults, but it doesn't hurt anything.
>>
>> > Why wouldn't you keep placement-strategy with default
>> > to keep things simple. You aren't using any load-balancing
>> > anyway as far as I understood it.
>>
>> It looks like the intent is to use placement-strategy to limit each node
>> to 1 resource. The configuration looks good for that.
>>
>> > Haven't used resource-stickiness=INF. No idea which strange
>> > behavior that triggers. Try to have it just higher than what
>> > the other scores might some up to.
>>
>> Either way would be fine. Using INFINITY ensures that no other
>> combination of scores will override it.
>>
>> > I might have overseen something in your scores but otherwise
>> > there is nothing obvious to me.
>> >
>> > Regards,
>> > Klaus
>>
>> I don't see anything obvious either. If you have logs around the time of
>> the incident, that might help.
>>
>> >> Regards,
>> >> Aswathi
>> >>
>> >> On Mon, May 15, 2017 at 2:46 PM, Klaus Wenninger <[email protected]
>> >> <mailto:[email protected]>> wrote:
>> >>
>> >>     On 05/15/2017 09:36 AM, Anu Pillai wrote:
>> >>     > Hi,
>> >>     >
>> >>     > We are running pacemaker cluster for managing our resources. We
>> >>     have 6
>> >>     > system running 5 resources and one is acting as standby. We have
>> a
>> >>     > restriction that, only one resource can run in one node. But our
>> >>     > observation is whenever we add or delete a resource from cluster
>> all
>> >>     > the remaining resources in the cluster are stopped and started
>> back.
>> >>     >
>> >>     > Can you please guide us whether this normal behavior or we are
>> >>     missing
>> >>     > any configuration that is leading to this issue.
>> >>
>> >>     It should definitely be possible to prevent this behavior.
>> >>     If you share your config with us we might be able to
>> >>     track that down.
>> >>
>> >>     Regards,
>> >>     Klaus
>> >>
>> >>     >
>> >>     > Regards
>> >>     > Aswathi
>>
>> _______________________________________________
>> Users mailing list: [email protected]
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

Reply via email to