Eli Mesika has uploaded a new change for review.

Change subject: core: retry fencing operation in case of failure
......................................................................

core: retry fencing operation in case of failure

retry fencing operation in case of a temporary failure

Up to now, if 'unknown' was returned while waiting for a certain status
('on' / 'off') , the while loop was exited with a proper message,
breaking the reboot sequence.
It seems that there are some cases in which a fencing operation may fail
temporarily and succeeded after a while.
This patch retries to get to the desired status 3 times before giving up
with the configured delay between each status check.

Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483
Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1140098
Signed-off-by: Eli Mesika <emes...@redhat.com>
---
M 
backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
1 file changed, 9 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/ovirt-engine refs/changes/86/33486/1

diff --git 
a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
 
b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
index 015c58b..f35ea78 100644
--- 
a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
+++ 
b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
@@ -511,7 +511,9 @@
     protected boolean waitForStatus(String vdsName, FenceActionType 
actionType, FenceAgentOrder order) {
         final String FENCE_CMD = (actionType == FenceActionType.Start) ? "on" 
: "off";
         final String ACTION_NAME = actionType.name().toLowerCase();
+        final int UNKNOWN_RESULT_ALLOWED = 3;
         int i = 1;
+        int j = 1;
         boolean statusReached = false;
         log.infoFormat("Waiting for vds {0} to {1}", vdsName, ACTION_NAME);
 
@@ -531,6 +533,13 @@
                     if (returnValue != null && returnValue.getReturnValue() != 
null) {
                         FenceStatusReturnValue value = 
(FenceStatusReturnValue) returnValue.getReturnValue();
                         if (value.getStatus().equalsIgnoreCase("unknown")) {
+                            // Allow command to fail temporarily
+                            if (j <= UNKNOWN_RESULT_ALLOWED && i <= 
getRerties()) {
+                                ThreadUtils.sleep(getDelayInSeconds() * 1000);
+                                i++;
+                                j++;
+                                continue;
+                            }
                             // No need to retry , agent definitions are 
corrupted
                             log.warnFormat("Host {0} {1} PM Agent definitions 
are corrupted, Waiting for Host to {2} aborted.", vdsName, order.name(), 
actionType.name());
                             break;


-- 
To view, visit http://gerrit.ovirt.org/33486
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483
Gerrit-PatchSet: 1
Gerrit-Project: ovirt-engine
Gerrit-Branch: master
Gerrit-Owner: Eli Mesika <emes...@redhat.com>
_______________________________________________
Engine-patches mailing list
Engine-patches@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-patches

Reply via email to