Eli Mesika has uploaded a new change for review.
Change subject: core: retry fencing operation in case of failure
......................................................................
core: retry fencing operation in case of failure
retry fencing operation in case of a temporary failure
Up to now, if 'unknown' was returned while waiting for a certain status
('on' / 'off') , the while loop was exited with a proper message,
breaking the reboot sequence.
It seems that there are some cases in which a fencing operation may fail
temporarily and succeeded after a while.
This patch retries to get to the desired status 3 times before giving up
with the configured delay between each status check.
Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483
Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1140098
Signed-off-by: Eli Mesika <[email protected]>
---
M
backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
1 file changed, 9 insertions(+), 0 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/ovirt-engine refs/changes/86/33486/1
diff --git
a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
index 015c58b..f35ea78 100644
---
a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
+++
b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java
@@ -511,7 +511,9 @@
protected boolean waitForStatus(String vdsName, FenceActionType
actionType, FenceAgentOrder order) {
final String FENCE_CMD = (actionType == FenceActionType.Start) ? "on"
: "off";
final String ACTION_NAME = actionType.name().toLowerCase();
+ final int UNKNOWN_RESULT_ALLOWED = 3;
int i = 1;
+ int j = 1;
boolean statusReached = false;
log.infoFormat("Waiting for vds {0} to {1}", vdsName, ACTION_NAME);
@@ -531,6 +533,13 @@
if (returnValue != null && returnValue.getReturnValue() !=
null) {
FenceStatusReturnValue value =
(FenceStatusReturnValue) returnValue.getReturnValue();
if (value.getStatus().equalsIgnoreCase("unknown")) {
+ // Allow command to fail temporarily
+ if (j <= UNKNOWN_RESULT_ALLOWED && i <=
getRerties()) {
+ ThreadUtils.sleep(getDelayInSeconds() * 1000);
+ i++;
+ j++;
+ continue;
+ }
// No need to retry , agent definitions are
corrupted
log.warnFormat("Host {0} {1} PM Agent definitions
are corrupted, Waiting for Host to {2} aborted.", vdsName, order.name(),
actionType.name());
break;
--
To view, visit http://gerrit.ovirt.org/33486
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483
Gerrit-PatchSet: 1
Gerrit-Project: ovirt-engine
Gerrit-Branch: master
Gerrit-Owner: Eli Mesika <[email protected]>
_______________________________________________
Engine-patches mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/engine-patches