Hi, Its possible by setting the num of reduce tasks to be 1. Based on your example, it looks like u need to group ur records based on "Date, counter1 and counter2". So that should go in the logic of building your key for your map o/p.
Thanks Sudhan S On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat <[email protected]> wrote: > Hi, > > I understand that given a file, the file is split across 'n' mapper > instances, which is the normal case. > > The scenario i have is : > 1. Two files which are not totally identical in terms of number of columns > (but have data that is similar in a few columns) need to be processed and > after computation a single output file has to be generated. > > Note : CV - computedvalue > > File1 belonging to one dataset has data for : > Date,counter1,counter2, CV1,CV2 > > File2 belonging to another dataset has data for : > Date,counter1,counter2,CV3,CV4,CV5 > > Computation to be carried out on these two files is : > CV6 =(CV1*CV5)/100 > > And the final emitted output file should have data in the sequence: > Date,counter1,counter2,CV6 > > The idea is to have two mappers (not instances) run on each of the file, > and a single reducer that emits the final result file. > > Thanks, > Sahana > > On Wed, Sep 7, 2011 at 2:40 PM, Harsh J <[email protected]> wrote: > >> Sahana, >> >> Yes. But, isn't that how it is normally? What makes you question this >> capability? >> >> On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat <[email protected]> wrote: >> > Hi, >> > Is it possible to have multiple mappers where each mapper is >> > operating on a different input file and whose result (which is a key >> value >> > pair from different mappers) is processed by a single reducer? >> > Regards, >> > Sahana >> >> >> >> -- >> Harsh J >> > >
