Re: Lack of data locality in Hadoop-0.20.2

Sudharsan Sampath Tue, 12 Jul 2011 06:06:04 -0700

what's the map task capacity of each node ?

On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti <[email protected]>wrote:


> Hi,
>
> I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input
> data using a 20 node cluster of nodes. HDFS is configured to use 128MB block
> size (so 1600maps are created) and a replication factor of 1 is being used.
> All the 20 nodes are also hdfs datanodes. I was using a bandwidth value of
> 50Mbps between each of the nodes (this was configured using linux "tc"). I
> see that around 90% of the map tasks are reading data over the network i.e.
> most of the map tasks are not being scheduled at the nodes where the data to
> be processed by them is located.
> My understanding was that Hadoop tries to schedule as many data-local maps
> as possible. But in this situation, this does not seem to happen. Any reason
> why this is happening? and is there a way to actually configure hadoop to
> ensure the maximum possible node locality?
> Any help regarding this is very much appreciated.
>
> Thanks,
> Virajith
>

Re: Lack of data locality in Hadoop-0.20.2

Reply via email to