Thanks Ken for the detailed response. I suppose I could even use some of the pcs/crm CLI commands then. Cheers.
On Wed, Mar 16, 2016 at 8:27 PM, Ken Gaillot <[email protected]> wrote: > On 03/16/2016 05:22 AM, Nikhil Utane wrote: > > I see following info gets updated in CIB. Can I use this or there is > better > > way? > > > > <node_state id="*node1*" uname="node1" in_ccm="false" crmd="offline" > > crm-debug-origin="peer_update_callback" join="*down*" expected="member"> > > in_ccm/crmd/join reflect the current state of the node (as known by the > partition that you're looking at the CIB on), so if the node went down > and came back up, it won't tell you anything about being down. > > - in_ccm indicates that the node is part of the underlying cluster layer > (heartbeat/cman/corosync) > > - crmd indicates that the node is communicating at the pacemaker layer > > - join indicates what phase of the join process the node is at > > There's not a direct way to see what node went down after the fact. > There are ways however: > > - if the node was running resources, those will be failed, and those > failures (including node) will be shown in the cluster status > > - the logs show all node membership events; you can search for logs such > as "state is now lost" and "left us" > > - "stonith -H $NODE_NAME" will show the fence history for a given node, > so if the node went down due to fencing, it will show up there > > - you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon > periodically and run a script for node events, and you can write the > script to do whatever you want (email you, etc.) (in the upcoming 1.1.15 > release, built-in notifications will make this more reliable and easier, > but any script you use with ClusterMon will still be usable with the new > method) > > > On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane < > [email protected]> > > wrote: > > > >> Hi Ken, > >> > >> Sorry about the long delay. This activity was de-focussed but now it's > >> back on track. > >> > >> One part of question that is still not answered is on the newly active > >> node, how to find out which was the node that went down? > >> Anything that gets updated in the status section that can be read and > >> figured out? > >> > >> Thanks. > >> Nikhil > >> > >> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot <[email protected]> > wrote: > >> > >>> On 01/08/2016 11:13 AM, Nikhil Utane wrote: > >>>>> I think stickiness will do what you want here. Set a stickiness > higher > >>>>> than the original node's preference, and the resource will want to > stay > >>>>> where it is. > >>>> > >>>> Not sure I understand this. Stickiness will ensure that resources > don't > >>>> move back when original node comes back up, isn't it? > >>>> But in my case, I want the newly standby node to become the backup > node > >>> for > >>>> all other nodes. i.e. it should now be able to run all my resource > >>> groups > >>>> albeit with a lower score. How do I achieve that? > >>> > >>> Oh right. I forgot to ask whether you had an opt-out > >>> (symmetric-cluster=true, the default) or opt-in > >>> (symmetric-cluster=false) cluster. If you're opt-out, every node can > run > >>> every resource unless you give it a negative preference. > >>> > >>> Partly it depends on whether there is a good reason to give each > >>> instance a "home" node. Often, there's not. If you just want to balance > >>> resources across nodes, the cluster will do that by default. > >>> > >>> If you prefer to put certain resources on certain nodes because the > >>> resources require more physical resources (RAM/CPU/whatever), you can > >>> set node attributes for that and use rules to set node preferences. > >>> > >>> Either way, you can decide whether you want stickiness with it. > >>> > >>>> Also can you answer, how to get the values of node that goes active > and > >>> the > >>>> node that goes down inside the OCF agent? Do I need to use > >>> notification or > >>>> some simpler alternative is available? > >>>> Thanks. > >>>> > >>>> > >>>> On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot <[email protected]> > >>> wrote: > >>>> > >>>>> On 01/08/2016 06:55 AM, Nikhil Utane wrote: > >>>>>> Would like to validate my final config. > >>>>>> > >>>>>> As I mentioned earlier, I will be having (upto) 5 active servers > and 1 > >>>>>> standby server. > >>>>>> The standby server should take up the role of active that went down. > >>> Each > >>>>>> active has some unique configuration that needs to be preserved. > >>>>>> > >>>>>> 1) So I will create total 5 groups. Each group has a > >>> "heartbeat::IPaddr2 > >>>>>> resource (for virtual IP) and my custom resource. > >>>>>> 2) The virtual IP needs to be read inside my custom OCF agent, so I > >>> will > >>>>>> make use of attribute reference and point to the value of IPaddr2 > >>> inside > >>>>> my > >>>>>> custom resource to avoid duplication. > >>>>>> 3) I will then configure location constraint to run the group > resource > >>>>> on 5 > >>>>>> active nodes with higher score and lesser score on standby. > >>>>>> For e.g. > >>>>>> Group Node Score > >>>>>> --------------------------------------------- > >>>>>> MyGroup1 node1 500 > >>>>>> MyGroup1 node6 0 > >>>>>> > >>>>>> MyGroup2 node2 500 > >>>>>> MyGroup2 node6 0 > >>>>>> .. > >>>>>> MyGroup5 node5 500 > >>>>>> MyGroup5 node6 0 > >>>>>> > >>>>>> 4) Now if say node1 were to go down, then stop action on node1 will > >>> first > >>>>>> get called. Haven't decided if I need to do anything specific here. > >>>>> > >>>>> To clarify, if node1 goes down intentionally (e.g. standby or stop), > >>>>> then all resources on it will be stopped first. But if node1 becomes > >>>>> unavailable (e.g. crash or communication outage), it will get fenced. > >>>>> > >>>>>> 5) But when the start action of node 6 gets called, then using crm > >>>>> command > >>>>>> line interface, I will modify the above config to swap node 1 and > >>> node 6. > >>>>>> i.e. > >>>>>> MyGroup1 node6 500 > >>>>>> MyGroup1 node1 0 > >>>>>> > >>>>>> MyGroup2 node2 500 > >>>>>> MyGroup2 node1 0 > >>>>>> > >>>>>> 6) To do the above, I need the newly active and newly standby node > >>> names > >>>>> to > >>>>>> be passed to my start action. What's the best way to get this > >>> information > >>>>>> inside my OCF agent? > >>>>> > >>>>> Modifying the configuration from within an agent is dangerous -- too > >>>>> much potential for feedback loops between pacemaker and the agent. > >>>>> > >>>>> I think stickiness will do what you want here. Set a stickiness > higher > >>>>> than the original node's preference, and the resource will want to > stay > >>>>> where it is. > >>>>> > >>>>>> 7) Apart from node name, there will be other information which I > plan > >>> to > >>>>>> pass by making use of node attributes. What's the best way to get > this > >>>>>> information inside my OCF agent? Use crm command to query? > >>>>> > >>>>> Any of the command-line interfaces for doing so should be fine, but > I'd > >>>>> recommend using one of the lower-level tools (crm_attribute or > >>>>> attrd_updater) so you don't have a dependency on a higher-level tool > >>>>> that may not always be installed. > >>>>> > >>>>>> Thank You. > >>>>>> > >>>>>> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane < > >>>>> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks to you Ken for giving all the pointers. > >>>>>>> Yes, I can use service start/stop which should be a lot simpler. > >>> Thanks > >>>>>>> again. :) > >>>>>>> > >>>>>>> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot <[email protected]> > >>>>> wrote: > >>>>>>> > >>>>>>>> On 12/22/2015 12:17 AM, Nikhil Utane wrote: > >>>>>>>>> I have prepared a write-up explaining my requirements and current > >>>>>>>> solution > >>>>>>>>> that I am proposing based on my understanding so far. > >>>>>>>>> Kindly let me know if what I am proposing is good or there is a > >>> better > >>>>>>>> way > >>>>>>>>> to achieve the same. > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>> > >>> > https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing > >>>>>>>>> > >>>>>>>>> Let me know if you face any issue in accessing the above link. > >>> Thanks. > >>>>>>>> > >>>>>>>> This looks great. Very well thought-out. > >>>>>>>> > >>>>>>>> One comment: > >>>>>>>> > >>>>>>>> "8. In the event of any failover, the standby node will get > notified > >>>>>>>> through an event and it will execute a script that will read the > >>>>>>>> configuration specific to the node that went down (again using > >>>>>>>> crm_attribute) and become active." > >>>>>>>> > >>>>>>>> It may not be necessary to use the notifications for this. > Pacemaker > >>>>>>>> will call your resource agent with the "start" action on the > standby > >>>>>>>> node, after ensuring it is stopped on the previous node. Hopefully > >>> the > >>>>>>>> resource agent's start action has (or can have, with configuration > >>>>>>>> options) all the information you need. > >>>>>>>> > >>>>>>>> If you do end up needing notifications, be aware that the feature > >>> will > >>>>>>>> be disabled by default in the 1.1.14 release, because changes in > >>> syntax > >>>>>>>> are expected in further development. You can define a compile-time > >>>>>>>> constant to enable them. > >
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
