Also could you tell us more about your task statuses? You might also have failed tasks...
Bertrand On Thu, Aug 16, 2012 at 11:01 PM, Bertrand Dechoux <[email protected]>wrote: > Well, there is speculative executions too. > > http://developer.yahoo.com/hadoop/tutorial/module4.html > > *Speculative execution:* One problem with the Hadoop system is that by >> dividing the tasks across many nodes, it is possible for a few slow nodes >> to rate-limit the rest of the program. For example if one node has a slow >> disk controller, then it may be reading its input at only 10% the speed of >> all the other nodes. So when 99 map tasks are already complete, the system >> is still waiting for the final map task to check in, which takes much >> longer than all the other nodes. >> By forcing tasks to run in isolation from one another, individual tasks >> do not know *where* their inputs come from. Tasks trust the Hadoop >> platform to just deliver the appropriate input. Therefore, the same input >> can be processed *multiple times in parallel*, to exploit differences in >> machine capabilities. As most of the tasks in a job are coming to a close, >> the Hadoop platform will schedule redundant copies of the remaining tasks >> across several nodes which do not have other work to perform. This process >> is known as *speculative execution*. When tasks complete, they announce >> this fact to the JobTracker. Whichever copy of a task finishes first >> becomes the definitive copy. If other copies were executing speculatively, >> Hadoop tells the TaskTrackers to abandon the tasks and discard their >> outputs. The Reducers then receive their inputs from whichever Mapper >> completed successfully, first. >> Speculative execution is enabled by default. You can disable speculative >> execution for the mappers and reducers by setting the >> mapred.map.tasks.speculative.execution and >> mapred.reduce.tasks.speculative.execution JobConf options to false, >> respectively. > > > > Can you tell us your configuration with regards to those parameters? > > Regards > > Bertrand > > On Thu, Aug 16, 2012 at 8:36 PM, in.abdul <[email protected]> wrote: > >> Hi Gaurav, >> Number map is not depents upon number block . It is really depends upon >> number of input splits . If you had 100GB of data and you had 10 split >> means then you can see only 10 maps . >> >> Please correct me if i am wrong >> >> Thanks and regards, >> Syed abdul kather >> On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" < >> [email protected]> wrote: >> >> > Hi users, >> > >> > I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all >> > the 12 nodes and 1 node running the Job Tracker). >> > In order to perform a WordCount benchmark test, I did the following: >> > >> > - Executed "RandomTextWriter" first to create 100 GB data (Note that >> I >> > have changed the "test.randomtextwrite.total_bytes" parameter only, >> rest >> > all are kept default). >> > - Next, executed the "WordCount" program for that 100 GB dataset. >> > >> > The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to >> my >> > calculation, total number of Maps to be executed by the wordcount job >> > should be 100 GB / 128 MB or 102400 MB / 128 MB = 800. >> > But when I am executing the job, it is running a total number of 900 >> Maps, >> > i.e., 100 extra. >> > So, why this extra number of Maps? Although, my job is completing >> > successfully without any error. >> > >> > Again, if I don't execute the "RandomTextWwriter" job to create data for >> > my wordcount, rather I put my own 100 GB text file in HDFS and run >> > "WordCount", I can then see the number of Maps are equivalent to my >> > calculation, i.e., 800. >> > >> > Can anyone tell me why this odd behaviour of Hadoop regarding the number >> > of Maps for WordCount only when the dataset is generated by >> > RandomTextWriter? And what is the purpose of these extra number of Maps? >> > >> > Regards, >> > Gaurav Dasgupta >> > >> > >> > ------------------------------ >> > If you reply to this email, your message will be added to the >> discussion >> > below: >> > >> > >> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html >> > To unsubscribe from Lucene, click here< >> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw >> > >> > . >> > NAML< >> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml >> > >> > >> >> >> >> >> ----- >> THANKS AND REGARDS, >> SYED ABDUL KATHER >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > > > > > -- > Bertrand Dechoux > -- Bertrand Dechoux
