Ori Liel has posted comments on this change. Change subject: core: Fix skip fencing if host connected to storage ......................................................................
Patch Set 2: (5 comments) http://gerrit.ovirt.org/#/c/36581/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java File backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/FenceVdsBaseCommand.java: Line 137: if (wasSkippedDueToPolicy(result)) { Line 138: // when fencing is skipped due to policy we want to suppress command result logging, because Line 139: // we fire an alert in VdsNotRespondingTreatment Line 140: setCommandShouldBeLogged(false); Line 141: setSucceeded(false); > There's a problem in current result of fence verb, because we can return sk 1) I understood from Eli while working on the refactoring that skipping due to policy is not a failure (it is a success). He was very explicit about that. If this is indeed so, then this change is incorrect. 2) I agree with you, Martin, that the fact that VDSM will return skipped=true for both of these different cases is problematic, and that is what lead to the confusing code structure. But if you take a close look, I have found a way to tell apart these two situations (notice that the difference in these methods is == as opposed to !=): /** * in the specific scenario where this stop/start command is executed in the context of a restart, we interpret * 'skipped' as having occurred to due a fencing policy violation. */ private boolean wasSkippedDueToPolicy(VDSFenceReturnValue result) { return result != null && result.isSkipped() && getParameters().getParentCommand() == VdcActionType.RestartVds; } /** * if stop/start command returned with status=skipped, and the command was NOT run in the context of a restart then * we interpret the skip as having occurred because the host is already at the required state. */ private boolean wasSkippedDueToStatus(VDSFenceReturnValue result) { return result != null && result.isSkipped() && getParameters().getParentCommand() != VdcActionType.RestartVds; } Line 142: } else { Line 143: setSucceeded(result.getSucceeded()); Line 144: } Line 145: setActionReturnValue(result); Line 207: fenceAgent.getId()); Line 208: } else if (wasSkippedDueToPolicy(fenceExecutionResult)) { Line 209: // fencing execution was skipped due to fencing policy Line 210: return fenceExecutionResult; Line 211: } else { Agreed, this does look like a bug :) Line 212: if (fenceExecutionResult.getSucceeded()) { Line 213: boolean requiredStatusAchieved = waitForStatus(); Line 214: int i = 0; Line 215: while (!requiredStatusAchieved && i < retries) { http://gerrit.ovirt.org/#/c/36581/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/RestartVdsCommand.java File backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/RestartVdsCommand.java: Line 51: public class RestartVdsCommand<T extends FenceVdsActionParameters> extends VdsCommand<T> { Line 52: Line 53: private static final String INTERNAL_FENCE_USER = "Engine"; Line 54: Line 55: protected boolean skippedDueToFencingPolicy; I think something is wrong with the use of variable. Only the subclass VdsNotRespondingTreatmentCommand reads it, but its value is set here in executeCommand() - which is overriden in VdsNotRespondingTreatmentCommand. The way it's written (unless I'm missing something) the value of skippedDueToFencingPolicy will never be true in VdsNotRespondingTreatmentCommand, making it redundant. I see that in the current state of things in VdsNotRespondingTreatmentCommand this is a method variable set to 'false' in the beginning of executeCommand(), never changed to true, and then it's value is checked. This appears to be a bug, but I'm not sure that what you did here solves the bug. Can you please explain what you meant to do here? Line 56: Line 57: /** Line 58: * Constructor for command creation when compensation is applied on startup Line 59: * Line 211: if (fenceReturnValue.getReturnValue() instanceof FenceStatusReturnValue) { Line 212: skipped = ((FenceStatusReturnValue) fenceReturnValue.getReturnValue()).getIsSkipped(); Line 213: } Line 214: } Line 215: return skipped; Indeed, looks like a bug Line 216: } Line 217: Line 218: @Override Line 219: public String getUserName() { http://gerrit.ovirt.org/#/c/36581/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/StopVdsCommand.java File backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/StopVdsCommand.java: Line 160: VDSFenceReturnValue result = getMostRelevantResult(results); Line 161: Line 162: if (result != null) { Line 163: if (result.getSucceeded()) { Line 164: int numOfSkipped = countSkippedOperations(results); > Well, it's agent agnostic, but concurrent agents make things complicated :- I tend to agree with Oved, I don't think this situation can ever happen (one agent skipped, another didn't skip). Maybe the old code tried to unnecessarily defend against an impossible situation? Line 165: if (numOfSkipped == 0) { Line 166: // no agent reported that stop operation was skipped, continue with stop operation Line 167: handleSpecificCommandActions(); Line 168: } else { -- To view, visit http://gerrit.ovirt.org/36581 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84fa56e75adf988ce36b1154269dab8974cf05bf Gerrit-PatchSet: 2 Gerrit-Project: ovirt-engine Gerrit-Branch: master Gerrit-Owner: Martin Peřina <mper...@redhat.com> Gerrit-Reviewer: Eli Mesika <emes...@redhat.com> Gerrit-Reviewer: Martin Peřina <mper...@redhat.com> Gerrit-Reviewer: Ori Liel <ol...@redhat.com> Gerrit-Reviewer: Oved Ourfali <oourf...@redhat.com> Gerrit-Reviewer: automat...@ovirt.org Gerrit-Reviewer: oVirt Jenkins CI Server Gerrit-HasComments: Yes _______________________________________________ Engine-patches mailing list Engine-patches@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-patches