goiri commented on code in PR #4779:
URL: https://github.com/apache/hadoop/pull/4779#discussion_r971031212
##########
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java:
##########
@@ -385,6 +385,20 @@ public void run() {
// TODO: Do it only once per NodeManager.
ContainerId containerID = event.getContainerID();
+ // If the container failed to launch earlier (due to dead node for
example),
+ // it has been marked as FAILED and removed from containers during
+ // CONTAINER_REMOTE_LAUNCH event handling.
+ // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as
+ // it is not necessary and could cost 15 minutes delay if the node is
dead.
+ if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP &&
Review Comment:
This should be done in the switch case CONTAINER_REMOTE_CLEANUP
##########
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java:
##########
@@ -385,6 +385,20 @@ public void run() {
// TODO: Do it only once per NodeManager.
ContainerId containerID = event.getContainerID();
+ // If the container failed to launch earlier (due to dead node for
example),
+ // it has been marked as FAILED and removed from containers during
+ // CONTAINER_REMOTE_LAUNCH event handling.
+ // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as
+ // it is not necessary and could cost 15 minutes delay if the node is
dead.
+ if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP &&
+ !containers.containsKey(containerID)) {
+ LOG.info("Skip cleanup of already-removed container " + containerID);
Review Comment:
Use logger {}
##########
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java:
##########
@@ -385,6 +385,20 @@ public void run() {
// TODO: Do it only once per NodeManager.
ContainerId containerID = event.getContainerID();
+ // If the container failed to launch earlier (due to dead node for
example),
+ // it has been marked as FAILED and removed from containers during
+ // CONTAINER_REMOTE_LAUNCH event handling.
+ // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as
+ // it is not necessary and could cost 15 minutes delay if the node is
dead.
+ if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP &&
+ !containers.containsKey(containerID)) {
+ LOG.info("Skip cleanup of already-removed container " + containerID);
Review Comment:
As we are at it, fix this in line 382 too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]