Thank you Ulrich and Ken, that was exactly the solution! Much appreciated. --Marc
On Wed, Jan 10, 2018 at 12:02 PM, Ken Gaillot <[email protected]> wrote: > On Wed, 2018-01-10 at 16:48 +0100, Ulrich Windl wrote: >> Hi! >> >> Common pitfall: The default parameters in the RA's metadata are not >> the defaults being configured when you don't specify a value; instead >> they are suggestions for you when configuring (don't ask me why!). >> Instead there is a global default timeout being used when you don't >> specify one. >> I hope I put that correctly. You could verify by manually adding the >> default avlues from the metadata to "demote". >> >> Regards, >> Ulrich > > Yep. That would be in the section of the configuration with "op start > interval=0 timeout=120" ... you want "op demote interval=0 timeout=" > with the desired value. > >> >> > > > Marc Smith <[email protected]> schrieb am 10.01.2018 um >> > > > 16:26 in >> >> Nachricht >> <CAKdCJ==ABfaKgsL4awK=vy_90pmamhrek6exxnazdplgx2a...@mail.gmail.com>: >> > Hi, >> > >> > I'm experiencing a time out on a demote operation and I'm not sure >> > which parameter / attribute needs to be updated to extend the time >> > out >> > window. >> > >> > I'm using Pacemaker 1.1.16 and Corosync 2.4.2. >> > >> > Here are the set of log lines that show the issue (shutdown >> > initiated, >> > then demote time out after 20 seconds): >> > --snip-- >> > Jan 10 09:08:13 tgtnode2 pacemakerd[1096]: notice: Caught >> > 'Terminated' >> > signal >> > Jan 10 09:08:13 tgtnode2 crmd[1104]: notice: Caught 'Terminated' >> > signal >> > Jan 10 09:08:13 tgtnode2 crmd[1104]: notice: State transition >> > S_IDLE >> > -> S_POLICY_ENGINE >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Scheduling Node >> > tgtnode2.parodyne.com for shutdown >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Promote >> > p_scst_zfs_vols:0^I(Slave -> Master tgtnode1.parodyne.com) >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Demote >> > p_scst_zfs_vols:1^I(Master -> Stopped tgtnode2.parodyne.com) >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Stop >> > p_dlm:1^I(tgtnode2.parodyne.com) >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Migrate >> > p_dummy_g_zfs^I(Started tgtnode2.parodyne.com -> >> > tgtnode1.parodyne.com) >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Move >> > p_zfs_pool_one^I(Started tgtnode2.parodyne.com -> >> > tgtnode1.parodyne.com) >> > Jan 10 09:08:13 tgtnode2 pengine[1103]: notice: Calculated >> > transition 3, saving inputs in >> > /var/lib/pacemaker/pengine/pe-input-1441.bz2 >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17449]: DEBUG: >> > scst_notify() -> Received a 'pre' / 'demote' notification. >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17449]: DEBUG: >> > p_scst_zfs_vols notify returned: 0 >> > Jan 10 09:08:13 tgtnode2 crmd[1104]: notice: Result of notify >> > operation for p_scst_zfs_vols on tgtnode2.parodyne.com: 0 (ok) >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_monitor() -> SCST version: 3.3.0-rc >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_monitor() -> Resource is running. >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_monitor() -> SCST local target group state: active >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_demote() -> Resource is currently running as Master. >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO: >> > Blocking >> > all 'zfs_vols' devices... >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > Waiting >> > for devices to finish blocking... >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_demote() -> Setting target group 'zfs_vols_local' ALUA state >> > to >> > 'transitioning'... >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO: >> > Collecting current configuration: done. -> Making requested >> > changes. >> > -> Setting Target Group attribute 'state' to value 'transitioning' >> > for >> > target group 'zfs_vols/zfs_vols_local': done. -> Done, 1 change(s) >> > made. All done. >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_demote() -> Setting target group 'zfs_vols_local' ALUA state >> > to >> > 'unavailable'... >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO: >> > Collecting current configuration: done. -> Making requested >> > changes. >> > -> Setting Target Group attribute 'state' to value 'unavailable' >> > for >> > target group 'zfs_vols/zfs_vols_local': done. -> Done, 1 change(s) >> > made. All done. >> > Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: >> > scst_demote() -> Changing the group's devices to inactive... >> > Jan 10 09:08:33 tgtnode2 lrmd[1101]: warning: >> > p_scst_zfs_vols_demote_0 process (PID 17473) timed out >> > Jan 10 09:08:33 tgtnode2 crmd[1104]: notice: Transition aborted >> > by >> > operation p_scst_zfs_vols_demote_0 'modify' on >> > tgtnode2.parodyne.com: >> > Event failed >> > Jan 10 09:08:33 tgtnode2 crmd[1104]: notice: Transition aborted >> > by >> > status-2-fail-count-p_scst_zfs_vols doing create >> > fail-count-p_scst_zfs_vols=1: Transient attribute change >> > --snip-- >> > >> > So I'm getting a "time out" after 20 seconds of waiting in the >> > demote >> > operation with this line: Jan 10 09:08:33 tgtnode2 lrmd[1101]: >> > warning: p_scst_zfs_vols_demote_0 process (PID 17473) timed out >> > >> > The 20 second time out is consistent when testing this, so I'm sure >> > it's just a configuration thing, but it's not obvious to me which >> > parameter/attribute/setting needs to be modified. >> > >> > The relevant metadata section from the RA referenced above: >> > --snip-- >> > <actions> >> > <action name="meta-data" timeout="5" /> >> > <action name="start" timeout="120" /> >> > <action name="stop" timeout="90" /> >> > <action name="monitor" timeout="20" depth="0" >> > interval="10" role="Master" /> >> > <action name="monitor" timeout="20" depth="0" >> > interval="20" role="Slave" /> >> > <action name="notify" timeout="20" /> >> > <action name="promote" timeout="60" /> >> > <action name="demote" timeout="60" /> >> > <action name="reload" timeout="20" /> >> > <action name="validate-all" timeout="20" /> >> > </actions> >> > --snip-- >> > >> > And the primitive and clone (multi-state) actual cluster >> > configuration >> > for the referenced resource: >> > --snip-- >> > primitive p_scst_zfs_vols ocf:esos:scst \ >> > params alua=true device_group=zfs_vols local_tgt_grp=zfs_vols_local >> > remote_tgt_grp=zfs_vols_remote m_alua_state=active >> > s_alua_state=unavailable use_trans_state=true set_dev_active=true \ >> > op monitor interval=10 role=Master \ >> > op monitor interval=20 role=Slave \ >> > op start interval=0 timeout=120 \ >> > op stop interval=0 timeout=90 >> > ms ms_scst_zfs_vols p_scst_zfs_vols \ >> > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 >> > notify=true interleave=true >> > --snip-- >> > >> > I see a few values in the RA's metadata action section with "20 >> > seconds" and the interval parameter for the primitive, but I'm not >> > sure which might be affecting this demote time out setting. Any >> > would >> > help be greatly appreciated. >> > >> > Thanks so much for your time! And thank you for a great software >> > product! >> > >> > >> > --Marc >> > >> > _______________________________________________ >> > Users mailing list: [email protected] >> > http://lists.clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc >> > h.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. >> pdf >> Bugs: http://bugs.clusterlabs.org > -- > Ken Gaillot <[email protected]> > > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
