Thanks, Leo. What is the config of a typical data node in a Hadoop cluster - cores, storage capacity, and connectivity (SATA?).? How many tasktrackers scheduled per core in general?
Is there a best practices guide somewhere? Thanks, Satheesh On Fri, May 11, 2012 at 10:48 AM, Leo Leung <[email protected]> wrote: > Nope, you must tune the config on that specific super node to have more > M/R slots (this is for 1.0.x) > This does not mean the JobTracker will be eager to stuff that super node > with all the M/R jobs at hand. > > It still goes through the scheduler, Capacity Scheduler is most likely > what you have. (check your config) > > IMO, If the data locality is not going to be there, your cluster is going > to suffer from Network I/O. > > > -----Original Message----- > From: Satheesh Kumar [mailto:[email protected]] > Sent: Friday, May 11, 2012 9:51 AM > To: [email protected] > Subject: Question on MapReduce > > Hi, > > I am a newbie on Hadoop and have a quick question on optimal compute vs. > storage resources for MapReduce. > > If I have a multiprocessor node with 4 processors, will Hadoop schedule > higher number of Map or Reduce tasks on the system than on a uni-processor > system? In other words, does Hadoop detect denser systems and schedule > denser tasks on multiprocessor systems? > > If yes, will that imply that it makes sense to attach higher capacity > storage to store more number of blocks on systems with dense compute? > > Any insights will be very useful. > > Thanks, > Satheesh >
