Thank you very much for the detailed explanation Anirudh. I think that my question about node / VM was due to some lack of knowledge (I'm just starting to learn the Hadoop environment). Regarding configuration of the nodes and clusters. This is something that I am not doing by myself. We have a dedicated team for managing the Hadoop cluster and I'll ask them.
I think that my question should have been: How many instances of the 'helper' class will be created in a single VM. And, as I understand, consider I am creating the helper in the setup / configure method, there would be one. And as long as it's stateless, I'm good. Thanks again, Eyal Eyal Golan [email protected] Visit: http://jvdrums.sourceforge.net/ LinkedIn: http://www.linkedin.com/in/egolan74 Skype: egolan74 P Save a tree. Please don't print this e-mail unless it's really necessary On Sat, Dec 31, 2011 at 1:36 PM, Anirudh <[email protected]> wrote: > I just wanted to confirm where exactly you were planning to have the > instantiation code, as it was not mentioned in your previous post. The > location would have made difference. As you are doing it in the setup of > mapper/reducer, you are good. > > I was referring to the Task JVM Reuse option: > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+JVM+Reuse > > It states that if the option to reuse JVM is enabled, the same Task JVM > will execute multiple tasks(i.e. map/reduce). I am not sure how this is > implemented, whether a new Mapper/Reducer is created for each task or they > too are re-reused. > If a new instance is created each time, then the mapper/reducer and all > its reference will be marked for garbage collection and you would be good. > If the Mapper/Reducer instances are re-used then the setup should be > called again creating another instance of your helper class. > > In my opinion the latter does not make sense, and the implementation would > be according to the prior approach i.e. creation of a new Mapper/Reducer > for each Task. But it would be interesting to check. > > As the classes in question are helper classes(stateless) you may not get > affected in terms of functionality. > > I am not clear on one of your statement: > > *How many map tasks will be created? One per split or one per VM (node)?* > *Are you suggesting that although there would be one Mapper in the node* > ... > > Have you configured your node to have a single slot for map/reduce task? > If yes then there will be one Mapper/Reducer task in the node. If no there > could be more than one mapper/reducer in the node depending on lots of > other paramerters i.e. no of mappers/reducers slots allocated on the node, > no. of input splits etc. If the node is configured to run more than one > Mapper/Reducer task the scheduler may choose to run more than one task on > the same node. The default is 2 Map & 2 Reduce tasks per node. And for each > task a new JVM is launched unless the JVM reuse option is enabled. > > Thanks, > Anirudh > > > On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan <[email protected]> wrote: > >> My idea is to create that class in the setup / configure method (depends >> which Mapper / Reducer I will inherit from). >> >> I don't understand the 'reuse' option you are referring to. >> How many map tasks will be created? One per split or one per VM (node)? >> Are you suggesting that although there would be one Mapper in the node, >> each new operator (or reflecting) will create a new instance? >> Thus making lots of that instance? >> >> BTW, >> these helper class I want to create are of course not going to be >> stateful. They are defiantly 'helper' class that have some logic. >> >> Thanks, >> >> Eyal >> >> Eyal Golan >> [email protected] >> >> Visit: http://jvdrums.sourceforge.net/ >> LinkedIn: http://www.linkedin.com/in/egolan74 >> Skype: egolan74 >> >> P Save a tree. Please don't print this e-mail unless it's really >> necessary >> >> >> >> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh <[email protected]>wrote: >> >>> Where are you creating this new class. If it is in the map function, >>> then it will be create a new object for each record in the split. >>> >>> Also you may need to see how the JVM reuse option works. I am not too >>> sure of this and you may want to look at the code. If the option for JVM >>> reuse is set, then my understanding is for every task, a new Map task would >>> be created and in that case the "new" operator will create another instance >>> even if this statement is not in the map function. >>> >>> >>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan <[email protected]> wrote: >>> >>>> Great News !! >>>> Thanks for the info. >>>> >>>> So using reflection, I can inject different implementations of >>>> interfaces (services) for the mapper (or reducer). >>>> And this way I can test a mapper (or reducer). >>>> Just by reflecting a stub instead of a real implementation. >>>> >>>> Thanks, >>>> >>>> >>>> >>>> Eyal Golan >>>> [email protected] >>>> >>>> Visit: http://jvdrums.sourceforge.net/ >>>> LinkedIn: http://www.linkedin.com/in/egolan74 >>>> Skype: egolan74 >>>> >>>> P Save a tree. Please don't print this e-mail unless it's really >>>> necessary >>>> >>>> >>>> >>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J <[email protected]> wrote: >>>> >>>>> Eyal, >>>>> >>>>> Yes, it is right to think of each Task attempt being one individual >>>>> JVM running individually on any added Node. Multiple slots would mean >>>>> multiple VMs in parallel as well. Yes, your use of reflection to build >>>>> your >>>>> objects will work just fine -- its all user-side java code that is >>>>> executed. >>>>> >>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote: >>>>> >>>>> Hi, >>>>> >>>>> I want to understand a basic concept in MR. >>>>> >>>>> If a mapper creates an instance of some class (using the 'new' >>>>> operator), then the created class exists ONCE in the VM of this node. >>>>> For each node. >>>>> Correct? >>>>> >>>>> Now, >>>>> what if instead of using the 'new' operator, the class is created >>>>> using reflection. >>>>> Is it valid in a MR? >>>>> Will only one instance of the created class be existing in that node? >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> Eyal >>>>> >>>>> Eyal Golan >>>>> [email protected] >>>>> >>>>> Visit: http://jvdrums.sourceforge.net/ >>>>> LinkedIn: http://www.linkedin.com/in/egolan74 >>>>> Skype: egolan74 >>>>> >>>>> P Save a tree. Please don't print this e-mail unless it's really >>>>> necessary >>>>> >>>>> >>>>> >>>> >>> >> >
