Thanks, again, Liyin.

On Sat, Aug 4, 2012 at 6:59 AM, 梁李印 <[email protected]> wrote:

> The optimization you mentioned is reduce-task locality-aware.
> Unfortunately,
> the current scheduler doesn't consider the reduce task's data locality. So
> a
> reduce task can be scheduled to any node with free slots.
> The following jira is discussing this problem:
> https://issues.apache.org/jira/browse/MAPREDUCE-2038
>
> Liyin Liang
> -----邮件原件-----
> 发件人: Satheesh Kumar [mailto:[email protected]]
> 发送时间: 2012年8月4日 1:47
> 收件人: [email protected]
> 主题: Re: 答复: MapReduce shuffle question
>
> Thank you. One more follow up question:
>
> Are there any optimizations to run map and reduces on the same nodes so
> that data is not transported across the network? Generally, how often and
> what % of map output is actually transferred over the network to reduce
> nodes?
>
> Thanks,
> Satheesh
>
> On Fri, Aug 3, 2012 at 7:33 AM, 梁李印 <[email protected]> wrote:
>
> > When a map task is done, its output is always flushed to the disk and
> > merged
> > to one file.
> > The benefit is that if the reducer is failed, the map need not to re-run.
> >
> > Liyin Liang
> >
> > -----邮件原件-----
> > 发件人: Satheesh Kumar [mailto:[email protected]]
> > 发送时间: 2012年8月3日 21:23
> > 收件人: [email protected]
> > 主题: MapReduce shuffle question
> >
> > Team, can someone please clarify the following question?
> >
> > In the map phase, the map output is written to the local disk. And in the
> > shuffle phase, the map output partitions are transferred to reduce nodes
> > using http. So, my question is assuming there are no spills (data set is
> > small enough to accommodate this), will the map output be transferred
> > directly from memory to the reduce nodes using http without a disk access
> > to write the map output? Or, is the map output always flushed to the disk
> > before transferred to reduce nodes?
> >
> > Appreciate the help.
> >
> > Thanks,
> > Satheesh
> >
> >
>
>

Reply via email to