Re: Number of Maps running more than expected

Bertrand Dechoux Thu, 16 Aug 2012 14:35:35 -0700

Also could you tell us more about your task statuses?
You might also have failed tasks...



Bertrand

On Thu, Aug 16, 2012 at 11:01 PM, Bertrand Dechoux <[email protected]>wrote:

> Well, there is speculative executions too.
>
> http://developer.yahoo.com/hadoop/tutorial/module4.html
>
> *Speculative execution:* One problem with the Hadoop system is that by
>> dividing the tasks across many nodes, it is possible for a few slow nodes
>> to rate-limit the rest of the program. For example if one node has a slow
>> disk controller, then it may be reading its input at only 10% the speed of
>> all the other nodes. So when 99 map tasks are already complete, the system
>> is still waiting for the final map task to check in, which takes much
>> longer than all the other nodes.
>> By forcing tasks to run in isolation from one another, individual tasks
>> do not know *where* their inputs come from. Tasks trust the Hadoop
>> platform to just deliver the appropriate input. Therefore, the same input
>> can be processed *multiple times in parallel*, to exploit differences in
>> machine capabilities. As most of the tasks in a job are coming to a close,
>> the Hadoop platform will schedule redundant copies of the remaining tasks
>> across several nodes which do not have other work to perform. This process
>> is known as *speculative execution*. When tasks complete, they announce
>> this fact to the JobTracker. Whichever copy of a task finishes first
>> becomes the definitive copy. If other copies were executing speculatively,
>> Hadoop tells the TaskTrackers to abandon the tasks and discard their
>> outputs. The Reducers then receive their inputs from whichever Mapper
>> completed successfully, first.
>> Speculative execution is enabled by default. You can disable speculative
>> execution for the mappers and reducers by setting the
>> mapred.map.tasks.speculative.execution and
>> mapred.reduce.tasks.speculative.execution JobConf options to false,
>> respectively.
>
>
>
> Can you tell us your configuration with regards to those parameters?
>
> Regards
>
> Bertrand
>
> On Thu, Aug 16, 2012 at 8:36 PM, in.abdul <[email protected]> wrote:
>
>> Hi Gaurav,
>>    Number map is not depents upon number block . It is really depends upon
>> number of input splits . If you had 100GB of data and you had 10 split
>> means then you can see only 10 maps .
>>
>> Please correct me if i am wrong
>>
>> Thanks and regards,
>> Syed abdul kather
>> On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
>> [email protected]> wrote:
>>
>> > Hi users,
>> >
>> > I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
>> > the 12 nodes and 1 node running the Job Tracker).
>> > In order to perform a WordCount benchmark test, I did the following:
>> >
>> >    - Executed "RandomTextWriter" first to create 100 GB data (Note that
>> I
>> >    have changed the "test.randomtextwrite.total_bytes" parameter only,
>> rest
>> >    all are kept default).
>> >    - Next, executed the "WordCount" program for that 100 GB dataset.
>> >
>> > The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to
>> my
>> > calculation, total number of Maps to be executed by the wordcount job
>> > should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
>> > But when I am executing the job, it is running a total number of 900
>> Maps,
>> > i.e., 100 extra.
>> > So, why this extra number of Maps? Although, my job is completing
>> > successfully without any error.
>> >
>> > Again, if I don't execute the "RandomTextWwriter" job to create data for
>> > my wordcount, rather I put my own 100 GB text file in HDFS and run
>> > "WordCount", I can then see the number of Maps are equivalent to my
>> > calculation, i.e., 800.
>> >
>> > Can anyone tell me why this odd behaviour of Hadoop regarding the number
>> > of Maps for WordCount only when the dataset is generated by
>> > RandomTextWriter? And what is the purpose of these extra number of Maps?
>> >
>> > Regards,
>> > Gaurav Dasgupta
>> >
>> >
>> > ------------------------------
>> >  If you reply to this email, your message will be added to the
>> discussion
>> > below:
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html
>> >  To unsubscribe from Lucene, click here<
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
>> >
>> > .
>> > NAML<
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
>> >
>> >
>>
>>
>>
>>
>> -----
>> THANKS AND REGARDS,
>> SYED ABDUL KATHER
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>
>
>
> --
> Bertrand Dechoux
>



-- 
Bertrand Dechoux

Re: Number of Maps running more than expected

Reply via email to