Harsh, Sorry for my poor English. There is one more question. As an administrator, I can set the max number of maps/reduces run on a datanode, then what I set the number of slots for? What's the differences between these attributes? In my opinion ,the number of slot depends on hardware while maps/reduces on software. Assuming that only one job is running, especially for benchmarking case PI computing. Thanks!
Tan Jun From: Harsh J Date: 2011-12-12 13:33 To: mapreduce-user; tanjun_2525 Subject: Re: Re: About slots of tasktracker and munber of map taskers Tan, As an admin, I can even choose to set configuration to even 100 slots on a 4-core node, if I feel like burning the box. There is no hardware auto-detection, and the slot limit is entirely controlled by the mapred-site.xml for that TaskTracker. The book merely tries to tell that you need to set these maximum slot settings based on your hardware knowledge on each node -- TaskTrackers do nothing of that sort on their own. There is some CPU/Memory considerations taken into account by a variety of non-default Schedulers in JobTracker, but your slot limits per tasktracker is entirely controlled by configuration. 2011/12/12 Tan Jun <[email protected]>: > Hi Harsh, > Now I know the number of maps and reduces run simultaneously is set by the > administrator in mapred-site.xml with default value 2. > But I cant get the point about number of slots. > For my understanding by now, > the number of?slots is decides by hardware that administrator cannot > change. > Is that wright? > > ________________________________ > Tan Jun > > From:�Harsh J > Date:?011-12-12?2:22 > To:�mapreduce-user > Subject:�Re: About slots of tasktracker and munber of map taskers > Hi Tan, > > On 12-Dec-2011, at 8:48 AM, Tan Jun wrote: > > Hi, > I dont really understand the meaning of the sentences in "The Definitive > Guide"(page 155): > > Tasktrackers�have�a�fixed�number�of�slots�for�map�tasks�and�for�reduce�tasks:�for�example, > a�tasktracker�may�be�able�to�run�two�map�tasks�and�two�reduce�tasks�simultaneously. > (The�precise�number�depends�on�the�number�of�cores�and�the�amount�of > memory�on�the�tasktracker;�see��Memory��on�page?54.) > > Does that mean the�number of slots is fixed and the number of maps run > simultaneously is set by user? > > > Not by the user, but by the administrator. Each tasktracker is configured in > production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a > 12-core machine. This is not auto-configured (unless you use auto cluster > setup+configuration tools that determine it for you [0]), and has to be set > when configuring Hadoop daemons. > > The book means to imply that you need to set these, based on the memory and > CPU configuration of your machines. By default, tasktrackers have limits of > 2+2. > > See�http://wiki.apache.org/hadoop/LimitingTaskSlotUsage > > [0] -�http://www.cloudera.com/products-services/tools/�is one. -- Harsh J
