[
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794711#comment-16794711
]
Bilwa S T edited comment on MAPREDUCE-7169 at 3/18/19 4:47 AM:
---------------------------------------------------------------
Hi [~uranus]
Same as what [~bibinchundatt] had suggested. We can skip nodes where previous
attempt was launched.
1. In TaskImpl#addAndScheduleAttempt whenever Avataar is SPECULATIVE
update previous attempts nodes and racks
2. In
TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE
we can skip previous attempts nodes and racks . Also we need to
keep record of blacklisted nodes
3. During allocation for mappers ie in
RMContainerAllocator#assignMapsWithLocality we have three
types of resource requests
1. Hosts - already we have updated
datahosts for the task attempt
2. Racks - this is also taken care in taskattemptimpl
3. Any - In this case we need
blacklisted node record which we had upadated so that we can check if node is
blacklisted for the task. If it is blacklisted then we skip allocating and it
would retry for other container
was (Author: bilwast):
Hi [~uranus]
Same as what [~bibinchundatt] had suggested. We can skip nodes where previous
attempt was launched.
1. In TaskImpl#addAndScheduleAttempt whenever Avataar is SPECULATIVE
> Speculative attempts should not run on the same node
> ----------------------------------------------------
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: yarn
> Affects Versions: 2.7.2
> Reporter: Lee chen
> Assignee: Zhaohui Xin
> Priority: Major
> Attachments: image-2018-12-03-09-54-07-859.png
>
>
> I found in all versions of yarn, Speculative Execution may set the
> speculative task to the node of original task.What i have read is only it
> will try to have one more task attempt. haven't seen any place mentioning not
> on same node.It is unreasonable.If the node have some problems lead to tasks
> execution will be very slow. and then placement the speculative task to same
> node cannot help the problematic task.
> In our cluster (version 2.7.2,2700 nodes),this phenomenon appear
> almost everyday.
> !image-2018-12-03-09-54-07-859.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]