Eli Mesika has uploaded a new change for review. Change subject: core: retry fencing operation in case of failure ......................................................................
core: retry fencing operation in case of failure retry fencing operation in case of a temporary failure Up to now, if 'unknown' was returned while waiting for a certain status ('on' / 'off') , the while loop was exited with a proper message, breaking the reboot sequence. It seems that there are some cases in which a fencing operation may fail temporarily and succeeded after a while. This patch retries to get to the desired status 3 times before giving up with the configured delay between each status check. Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483 Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1140098 Signed-off-by: Eli Mesika <emes...@redhat.com> --- M backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java 1 file changed, 9 insertions(+), 0 deletions(-) git pull ssh://gerrit.ovirt.org:29418/ovirt-engine refs/changes/86/33486/1 diff --git a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java index 015c58b..f35ea78 100644 --- a/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java +++ b/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java @@ -511,7 +511,9 @@ protected boolean waitForStatus(String vdsName, FenceActionType actionType, FenceAgentOrder order) { final String FENCE_CMD = (actionType == FenceActionType.Start) ? "on" : "off"; final String ACTION_NAME = actionType.name().toLowerCase(); + final int UNKNOWN_RESULT_ALLOWED = 3; int i = 1; + int j = 1; boolean statusReached = false; log.infoFormat("Waiting for vds {0} to {1}", vdsName, ACTION_NAME); @@ -531,6 +533,13 @@ if (returnValue != null && returnValue.getReturnValue() != null) { FenceStatusReturnValue value = (FenceStatusReturnValue) returnValue.getReturnValue(); if (value.getStatus().equalsIgnoreCase("unknown")) { + // Allow command to fail temporarily + if (j <= UNKNOWN_RESULT_ALLOWED && i <= getRerties()) { + ThreadUtils.sleep(getDelayInSeconds() * 1000); + i++; + j++; + continue; + } // No need to retry , agent definitions are corrupted log.warnFormat("Host {0} {1} PM Agent definitions are corrupted, Waiting for Host to {2} aborted.", vdsName, order.name(), actionType.name()); break; -- To view, visit http://gerrit.ovirt.org/33486 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iab5dadd5cb07f2c61dfdf6d75e9fccca790ba483 Gerrit-PatchSet: 1 Gerrit-Project: ovirt-engine Gerrit-Branch: master Gerrit-Owner: Eli Mesika <emes...@redhat.com> _______________________________________________ Engine-patches mailing list Engine-patches@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-patches