Its not clear to me that you need custom input formats.... 1) Getmerge might work or
2) Simply run a SINGLE reducer job (have mappers output static final int key=1, or specify numReducers=1). In this case, only one reducer will be called, and it will read through all the values. On Tue, Jul 31, 2012 at 12:30 AM, Bejoy KS <[email protected]> wrote: > Hi > > Why not use 'hadoop fs -getMerge <outputFolderInHdfs> > <targetFileNameInLfs>' while copying files out of hdfs for the end users to > consume. This will merge all the files in 'outputFolderInHdfs' into one > file and put it in lfs. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Michael Segel <[email protected]> > Date: Mon, 30 Jul 2012 21:08:22 > To: <[email protected]> > Reply-To: [email protected] > Subject: Re: Merge Reducers Output > > Why not use a combiner? > > On Jul 30, 2012, at 7:59 PM, Mike S wrote: > > > Liked asked several times, I need to merge my reducers output files. > > Imagine I have many reducers which will generate 200 files. Now to > > merge them together, I have written another map reduce job where each > > mapper read a complete file in full in memory, and output that and > > then only one reducer has to merge them together. To do so, I had to > > write a custom fileinputreader that reads the complete file into > > memory and then another custom fileoutputfileformat to append the each > > reducer item bytes together. this how my mapper and reducers looks > > like > > > > public static class MapClass extends Mapper<NullWritable, > > BytesWritable, IntWritable, BytesWritable> > > { > > @Override > > public void map(NullWritable key, BytesWritable value, > Context > > context) throws IOException, InterruptedException > > { > > context.write(key, value); > > } > > } > > > > public static class Reduce extends Reducer<NullWritable, > > BytesWritable, NullWritable, BytesWritable> > > { > > @Override > > public void reduce(NullWritable key, > Iterable<BytesWritable> values, > > Context context) throws IOException, InterruptedException > > { > > for (BytesWritable value : values) > > { > > context.write(NullWritable.get(), value); > > } > > } > > } > > > > I still have to have one reducers and that is a bottle neck. Please > > note that I must do this merging as the users of my MR job are outside > > my hadoop environment and the result as one file. > > > > Is there better way to merge reducers output files? > > > > -- Jay Vyas MMSB/UCHC
