what's the map task capacity of each node ? On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti <[email protected]>wrote:
> Hi, > > I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input > data using a 20 node cluster of nodes. HDFS is configured to use 128MB block > size (so 1600maps are created) and a replication factor of 1 is being used. > All the 20 nodes are also hdfs datanodes. I was using a bandwidth value of > 50Mbps between each of the nodes (this was configured using linux "tc"). I > see that around 90% of the map tasks are reading data over the network i.e. > most of the map tasks are not being scheduled at the nodes where the data to > be processed by them is located. > My understanding was that Hadoop tries to schedule as many data-local maps > as possible. But in this situation, this does not seem to happen. Any reason > why this is happening? and is there a way to actually configure hadoop to > ensure the maximum possible node locality? > Any help regarding this is very much appreciated. > > Thanks, > Virajith >
