Re: Question on MapReduce

Satheesh Kumar Fri, 11 May 2012 12:48:46 -0700

Thanks, Leo. What is the config of a typical data node in a Hadoop cluster
- cores, storage capacity, and connectivity (SATA?).? How many tasktrackers
scheduled per core in general?


Is there a best practices guide somewhere?

Thanks,
Satheesh

On Fri, May 11, 2012 at 10:48 AM, Leo Leung <[email protected]> wrote:

> Nope, you must tune the config on that specific super node to have more
> M/R slots (this is for 1.0.x)
> This does not mean the JobTracker will be eager to stuff that super node
> with all the M/R jobs at hand.
>
> It still goes through the scheduler,  Capacity Scheduler is most likely
> what you have.  (check your config)
>
> IMO, If the data locality is not going to be there, your cluster is going
> to suffer from Network I/O.
>
>
> -----Original Message-----
> From: Satheesh Kumar [mailto:[email protected]]
> Sent: Friday, May 11, 2012 9:51 AM
> To: [email protected]
> Subject: Question on MapReduce
>
> Hi,
>
> I am a newbie on Hadoop and have a quick question on optimal compute vs.
> storage resources for MapReduce.
>
> If I have a multiprocessor node with 4 processors, will Hadoop schedule
> higher number of Map or Reduce tasks on the system than on a uni-processor
> system? In other words, does Hadoop detect denser systems and schedule
> denser tasks on multiprocessor systems?
>
> If yes, will that imply that it makes sense to attach higher capacity
> storage to store more number of blocks on systems with dense compute?
>
> Any insights will be very useful.
>
> Thanks,
> Satheesh
>

Re: Question on MapReduce

Reply via email to