I don't see why you'd have to use the WholeFileInputFormat for such a task. Your task is very similar to joins, and you can see the section "General reducer-side join" for what your overall logic should look like, under Ricky's http://horicky.blogspot.in/2010/08/designing-algorithmis-for-map-reduce.html article.
On Tue, Jul 10, 2012 at 7:46 PM, Mohammad Tariq <[email protected]> wrote: > Hello Harsh, > > Thank you so much for the quick response. Actually I have a > use case wherein I have to compare values that are coming from 2 > mappers to one reducer. For that I am planning to use MultipleInputs > class. In one mapper I have a text file (these files may contain > 1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13, > 20-25, 32-38 and so on from each line of this file. In the second > mapper I have to read values from an Hbase table. The columns of this > table correspond to the fields which I am reading from the text file > in the first mapper. > In the reducer I have to compare the results coming for both > the mappers and generate the final result. Need your guidance. Many > thanks. > > Regards, > Mohammad Tariq > > > On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <[email protected]> wrote: >> It depends on what you need. If your file is not splittable, or if you >> need to read the whole file from a single mapper itself (i.e. you do >> not _want_ it to be split), then use WholeFileInputFormats. Otherwise, >> you get more parallelism with regular splitting. >> >> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <[email protected]> wrote: >>> Hello list, >>> >>> What could be the approximate maximum size of the files that >>> can be handled using WholeFileInputFormat format??I mean, if the file >>> is very big, then is it feasible to use WholeFileInputFormat as the >>> entire load will go to one mapper??Many thanks. >>> >>> Regards, >>> Mohammad Tariq >> >> >> >> -- >> Harsh J -- Harsh J
