[
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268769#comment-17268769
]
Bilwa S T commented on MAPREDUCE-7314:
--------------------------------------
Hi [~epayne]
This is in branch MR-6749 when container reuse is enabled
> Job will hang if NM is restarted while its running
> --------------------------------------------------
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Bilwa S T
> Assignee: Bilwa S T
> Priority: Major
>
> This is due to three different reasons
> # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
> # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill
> current attempt which is assigned to container. That is because task attempt
> is not updated in ContainerLauncherImpl#Container class.
> # Container gets assigned to task attempt even when container has stopped
> running ie Container completed event is processed. This is because we add
> reuse container map to allocated list. Makeremoterequest gets the same
> container in allocationResponse whereas RM has sent same container in
> finished container list. To avoid this we need to make sure allocated list
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]