Re: Merge Reducers Output

Michael Segel Tue, 31 Jul 2012 13:44:37 -0700

Sorry, but the OP was saying he had map/reduce job where the job had multiple 
reducers where he wanted to then combine the output to a single file. 
While you could merge the output files, you could also use a combiner then an 
identity reducer all within the same M/R job.



On Jul 31, 2012, at 10:10 AM, Raj Vishwanathan <[email protected]> wrote:

> Is there a requirement for the final reduce file to be sorted? If not, 
> wouldn't a map only job ( +  a combiner, ) and a merge only job provide the 
> answer?
> 
> Raj
> 
> 
> 
>> ________________________________
>> From: Michael Segel <[email protected]>
>> To: [email protected] 
>> Sent: Tuesday, July 31, 2012 5:24 AM
>> Subject: Re: Merge Reducers Output
>> 
>> You really don't want to run a single reducer unless you know that you don't 
>> have a lot of mappers. 
>> 
>> As long as the output data types and structure are the same as the input, 
>> you can run your code as the combiner, and then run it again as the reducer. 
>> Problem solved with one or two lines of code. 
>> If your input and output don't match, then you can use the existing code as 
>> a combiner, and then write a new reducer. It could as easily be an identity 
>> reducer too. (Don't know the exact problem.) 
>> 
>> So here's a silly question. Why wouldn't you want to run a combiner? 
>> 
>> 
>> On Jul 31, 2012, at 12:08 AM, Jay Vyas <[email protected]> wrote:
>> 
>>> Its not clear to me that you need custom input formats....
>>> 
>>> 1) Getmerge might work or
>>> 
>>> 2) Simply run a SINGLE reducer job (have mappers output static final int
>>> key=1, or specify numReducers=1).
>>> 
>>> In this case, only one reducer will be called, and it will read through all
>>> the values.
>>> 
>>> On Tue, Jul 31, 2012 at 12:30 AM, Bejoy KS <[email protected]> wrote:
>>> 
>>>> Hi
>>>> 
>>>> Why not use 'hadoop fs -getMerge <outputFolderInHdfs>
>>>> <targetFileNameInLfs>' while copying files out of hdfs for the end users to
>>>> consume. This will merge all the files in 'outputFolderInHdfs'  into one
>>>> file and put it in lfs.
>>>> 
>>>> Regards
>>>> Bejoy KS
>>>> 
>>>> Sent from handheld, please excuse typos.
>>>> 
>>>> -----Original Message-----
>>>> From: Michael Segel <[email protected]>
>>>> Date: Mon, 30 Jul 2012 21:08:22
>>>> To: <[email protected]>
>>>> Reply-To: [email protected]
>>>> Subject: Re: Merge Reducers Output
>>>> 
>>>> Why not use a combiner?
>>>> 
>>>> On Jul 30, 2012, at 7:59 PM, Mike S wrote:
>>>> 
>>>>> Liked asked several times, I need to merge my reducers output files.
>>>>> Imagine I have many reducers which will generate 200 files. Now to
>>>>> merge them together, I have written another map reduce job where each
>>>>> mapper read a complete file in full in memory, and output that and
>>>>> then only one reducer has to merge them together. To do so, I had to
>>>>> write a custom fileinputreader that reads the complete file into
>>>>> memory and then another custom fileoutputfileformat to append the each
>>>>> reducer item bytes together. this how my mapper and reducers looks
>>>>> like
>>>>> 
>>>>> public static class MapClass extends Mapper<NullWritable,
>>>>> BytesWritable, IntWritable, BytesWritable>
>>>>>       {
>>>>>               @Override
>>>>>               public void map(NullWritable key, BytesWritable value,
>>>> Context
>>>>> context) throws IOException, InterruptedException
>>>>>               {
>>>>>                       context.write(key, value);
>>>>>               }
>>>>>       }
>>>>> 
>>>>>       public static class Reduce extends Reducer<NullWritable,
>>>>> BytesWritable, NullWritable, BytesWritable>
>>>>>       {
>>>>>               @Override
>>>>>               public void reduce(NullWritable key,
>>>> Iterable<BytesWritable> values,
>>>>> Context context) throws IOException, InterruptedException
>>>>>               {
>>>>>                       for (BytesWritable value : values)
>>>>>                       {
>>>>>                               context.write(NullWritable.get(), value);
>>>>>                       }
>>>>>               }
>>>>>       }
>>>>> 
>>>>> I still have to have one reducers and that is a bottle neck. Please
>>>>> note that I must do this merging as the users of my MR job are outside
>>>>> my hadoop environment and the result as one file.
>>>>> 
>>>>> Is there better way to merge reducers output files?
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Jay Vyas
>>> MMSB/UCHC
>> 
>> 
>>

Re: Merge Reducers Output

Reply via email to