Sorry, but the OP was saying he had map/reduce job where the job had multiple reducers where he wanted to then combine the output to a single file. While you could merge the output files, you could also use a combiner then an identity reducer all within the same M/R job.
On Jul 31, 2012, at 10:10 AM, Raj Vishwanathan <[email protected]> wrote: > Is there a requirement for the final reduce file to be sorted? If not, > wouldn't a map only job ( + a combiner, ) and a merge only job provide the > answer? > > Raj > > > >> ________________________________ >> From: Michael Segel <[email protected]> >> To: [email protected] >> Sent: Tuesday, July 31, 2012 5:24 AM >> Subject: Re: Merge Reducers Output >> >> You really don't want to run a single reducer unless you know that you don't >> have a lot of mappers. >> >> As long as the output data types and structure are the same as the input, >> you can run your code as the combiner, and then run it again as the reducer. >> Problem solved with one or two lines of code. >> If your input and output don't match, then you can use the existing code as >> a combiner, and then write a new reducer. It could as easily be an identity >> reducer too. (Don't know the exact problem.) >> >> So here's a silly question. Why wouldn't you want to run a combiner? >> >> >> On Jul 31, 2012, at 12:08 AM, Jay Vyas <[email protected]> wrote: >> >>> Its not clear to me that you need custom input formats.... >>> >>> 1) Getmerge might work or >>> >>> 2) Simply run a SINGLE reducer job (have mappers output static final int >>> key=1, or specify numReducers=1). >>> >>> In this case, only one reducer will be called, and it will read through all >>> the values. >>> >>> On Tue, Jul 31, 2012 at 12:30 AM, Bejoy KS <[email protected]> wrote: >>> >>>> Hi >>>> >>>> Why not use 'hadoop fs -getMerge <outputFolderInHdfs> >>>> <targetFileNameInLfs>' while copying files out of hdfs for the end users to >>>> consume. This will merge all the files in 'outputFolderInHdfs' into one >>>> file and put it in lfs. >>>> >>>> Regards >>>> Bejoy KS >>>> >>>> Sent from handheld, please excuse typos. >>>> >>>> -----Original Message----- >>>> From: Michael Segel <[email protected]> >>>> Date: Mon, 30 Jul 2012 21:08:22 >>>> To: <[email protected]> >>>> Reply-To: [email protected] >>>> Subject: Re: Merge Reducers Output >>>> >>>> Why not use a combiner? >>>> >>>> On Jul 30, 2012, at 7:59 PM, Mike S wrote: >>>> >>>>> Liked asked several times, I need to merge my reducers output files. >>>>> Imagine I have many reducers which will generate 200 files. Now to >>>>> merge them together, I have written another map reduce job where each >>>>> mapper read a complete file in full in memory, and output that and >>>>> then only one reducer has to merge them together. To do so, I had to >>>>> write a custom fileinputreader that reads the complete file into >>>>> memory and then another custom fileoutputfileformat to append the each >>>>> reducer item bytes together. this how my mapper and reducers looks >>>>> like >>>>> >>>>> public static class MapClass extends Mapper<NullWritable, >>>>> BytesWritable, IntWritable, BytesWritable> >>>>> { >>>>> @Override >>>>> public void map(NullWritable key, BytesWritable value, >>>> Context >>>>> context) throws IOException, InterruptedException >>>>> { >>>>> context.write(key, value); >>>>> } >>>>> } >>>>> >>>>> public static class Reduce extends Reducer<NullWritable, >>>>> BytesWritable, NullWritable, BytesWritable> >>>>> { >>>>> @Override >>>>> public void reduce(NullWritable key, >>>> Iterable<BytesWritable> values, >>>>> Context context) throws IOException, InterruptedException >>>>> { >>>>> for (BytesWritable value : values) >>>>> { >>>>> context.write(NullWritable.get(), value); >>>>> } >>>>> } >>>>> } >>>>> >>>>> I still have to have one reducers and that is a bottle neck. Please >>>>> note that I must do this merging as the users of my MR job are outside >>>>> my hadoop environment and the result as one file. >>>>> >>>>> Is there better way to merge reducers output files? >>>>> >>>> >>>> >>> >>> >>> -- >>> Jay Vyas >>> MMSB/UCHC >> >> >>
