Sorry, I meant have you set the mapred.jobtracker.taskScheduler property in your mapred-site.xml file. If not, you're using the standard, FIFO scheduler. The default scheduler doesn't do data-local scheduling, but the fair scheduler and capacity scheduler do. You want to set mapred.jobtracker.taskScheduler to either org.apache.hadoop.mapred.FairScheduler (for the fair scheduler) or org.apache.hadoop.mapred.CapacityTaskScheduler (for the capacity scheduler) and then restart the JobTracker. You can read about the two schedulers here:
http://hadoop.apache.org/common/docs/current/fair_scheduler.html http://hadoop.apache.org/common/docs/current/capacity_scheduler.html -Joey On Sat, Mar 3, 2012 at 6:32 PM, Hassen Riahi <[email protected]> wrote: > The jobtracker is running in another machine (node C) > > Hassen > > >> Which scheduler are you using? >> >> -Joey >> >> On Mar 3, 2012, at 18:52, Hassen Riahi <[email protected]> wrote: >> >>> Hi all, >>> >>> We tried using mapreduce to execute a simple map code which read a txt >>> file stored in HDFS and write then the output. >>> The file to read is a very small one. It was not split and written >>> entirely and only in a single datanode (node A). This node is configured >>> also as a tasktracker node >>> While we was expecting that the location of the map execution is node A >>> (since the input is stored there), from log files, we see that the map was >>> executed in another tasktracker (node B) of the cluster. >>> Am I missing something? >>> >>> Thanks for the help! >>> Hassen >>> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
