Martin Peřina has posted comments on this change. Change subject: core: Skip fencing if host has connectivity issues ......................................................................
Patch Set 9: (6 comments) I would also prefer the logic to execute or not to be part of VdsNotRespondingTreatment.canDoAction, because IMO it's logical part of this command and not part of VdsEventListener. http://gerrit.ovirt.org/#/c/31303/9/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VdsEventListener.java File backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VdsEventListener.java: Line 236: log.infoFormat("ResourceManager::vdsNotResponding entered for Host {0}, {1}", Line 237: vds.getId(), Line 238: vds.getHostName()); Line 239: Line 240: boolean shouldExecRealFencing = isBelowConnectivityBrokenThreshold(vds); I would prefer isConnectivityBrokenThresholdReached() name and negation here Line 241: Line 242: if (executeSshSoftFencing) { Line 243: VdcReturnValueBase retVal = Line 244: Backend.getInstance().runInternalAction(VdcActionType.SshSoftFencing, Line 266: }); Line 267: } Line 268: private boolean isBelowConnectivityBrokenThreshold(VDS vds) { Line 269: VDSGroup cluster = DbFacade.getInstance().getVdsGroupDao().get(vds.getVdsGroupId()); Line 270: double percents=0; double percents = 0; Line 271: boolean result = true; Line 272: if (cluster.getFencingPolicy().isSkipFencingIfConnectivityBroken()) { Line 273: List<VDS> hosts = DbFacade.getInstance().getVdsDao().getAllForVdsGroup(cluster.getId()); Line 274: double hostsNumber = hosts.size(); Line 282: double hostsWithBrokenConnectivityNumber = hostsWithBrokenConnectivity.size(); Line 283: percents = (hostsWithBrokenConnectivityNumber/hostsNumber)*100.0; Line 284: result = (percents < cluster.getFencingPolicy().getHostsWithBrokenConnectivityThreshold()); Line 285: } Line 286: if (! result) { Please remove space: if (!result) { Line 287: logAlert(vds, percents); Line 288: } Line 289: return result; Line 290: } Line 288: } Line 289: return result; Line 290: } Line 291: Line 292: private void logAlert(VDS host, Double percents) { Why new method when the code is called only from one place? Line 293: AuditLogableBase auditLogable = new AuditLogableBase(); Line 294: auditLogable.addCustomValue("Percents", percents.toString()); Line 295: auditLogable.setVdsId(host.getId()); Line 296: AuditLogDirector.log(auditLogable, AuditLogType.VDS_ALERT_FENCE_OPERATION_SKIPPED_BROKEN_CONNECTIVITY); http://gerrit.ovirt.org/#/c/31303/9/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/FencingPolicy.java File backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/FencingPolicy.java: Line 15: hostsWithBrokenConnectivityThreshold Please initialize hostsWithBrokenConnectivityThreshold to 50 and don't rely only UI code. http://gerrit.ovirt.org/#/c/31303/9/backend/manager/modules/dal/src/main/resources/bundles/AuditLogMessages.properties File backend/manager/modules/dal/src/main/resources/bundles/AuditLogMessages.properties: Line 615: VDS_ALERT_SECONDARY_AGENT_USED_FOR_FENCE_OPERATION=Secondary fence agent was used to ${Operation} Host ${VdsName} Line 616: VDS_ALERT_PM_HEALTH_CHECK_FAILED_FOR_CON_PRIMARY_AGENT=Health check failed on Host ${VdsName} primary concurrent agent, future fence operations may fail on this Host. Line 617: VDS_ALERT_PM_HEALTH_CHECK_FAILED_FOR_SEQ_PRIMARY_AGENT=Health check failed on Host ${VdsName} primary sequential agent, future fence operations may fail is secondary agent if not defined properly. Line 618: VDS_ALERT_PM_HEALTH_CHECK_FAILED_FOR_CON_SECONDARY_AGENT=Health check failed on Host ${VdsName} secondary concurrent agent, future fence operations may fail on this Host. Line 619: VDS_ALERT_FENCE_OPERATION_SKIPPED_BROKEN_CONNECTIVITY=Host ${VdsName} became non responsive and was not restarted due to Fencing Policy : ${Percents} percents of the Hosts in Cluster have connectivity issues. Please remove space before ':' ... restarted due to Fencing Policy: ${Percents} ... Line 620: VDS_ALERT_PM_HEALTH_CHECK_FAILED_FOR_SEQ_SECONDARY_AGENT=Health check failed on Host ${VdsName} secondary sequential agent, future fence operations may fail is primary agent if not defined properly. Line 621: VDS_HOST_NOT_RESPONDING_CONNECTING=Host ${VdsName} is not responding. It will stay in Connecting state for a grace period of ${Seconds} seconds and after that an attempt to fence the host will be issued. Line 622: TASK_STOPPING_ASYNC_TASK=Stopping async task ${CommandName} that started at ${Date} Line 623: REFRESH_REPOSITORY_IMAGE_LIST_FAILED=Refresh image list failed for domain(s): ${imageDomains}. Please check domain activity. -- To view, visit http://gerrit.ovirt.org/31303 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a9c7db43b50421414ce9596137767b00cbfc2ae Gerrit-PatchSet: 9 Gerrit-Project: ovirt-engine Gerrit-Branch: master Gerrit-Owner: Eli Mesika <emes...@redhat.com> Gerrit-Reviewer: Eli Mesika <emes...@redhat.com> Gerrit-Reviewer: Martin Peřina <mper...@redhat.com> Gerrit-Reviewer: Oved Ourfali <oourf...@redhat.com> Gerrit-Reviewer: automat...@ovirt.org Gerrit-Reviewer: oVirt Jenkins CI Server Gerrit-HasComments: Yes _______________________________________________ Engine-patches mailing list Engine-patches@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-patches