Re: OOM Error Map output copy.

Niranjan Balasubramanian Sat, 10 Dec 2011 17:32:18 -0800

All

Thanks for your inputs. Fortunately, our requirements for this job =
changed, allowing me to use a combiner that ended up reducing the data =
that was being pumped to the reducers and the problem went away.=20


@Arun -- You are probably right that the number of reducers is rather =
small, given the amount of data that is being reduced. We are currently =
using the Fair scheduler and it fits our other use cases very well. The =
newer versions of Fair scheduler appear to support limits on the reducer =
pools. So this solution has to wait until we can make a switch.=20


~ Niranjan.

On Dec 9, 2011, at 8:51 PM, Chandraprakash Bhagtani wrote:

> Hi Niranjan,
> 
> Your issue looks similar to
> https://issues.apache.org/jira/browse/MAPREDUCE-1182 . Which hadoop version
> are you using?
> 
> 
> On Sat, Dec 10, 2011 at 12:17 AM, Prashant Kommireddi
> <[email protected]>wrote:
> 
>> Arun, I faced the same issue and increasing the # of reducers fixed the
>> problem.
>> 
>> I was initially under the impression MR framework spills to disk if data is
>> too huge to keep in memory, however on extraordinarily large reduce inputs
>> this was not the case and the job failed on trying to assign the in-memory
>> buffer
>> 
>> private MapOutput shuffleInMemory(MapOutputLocation mapOutputLoc,
>>                                       URLConnection connection,
>>                                       InputStream input,
>>                                       int mapOutputLength,
>>                                       int compressedLength)
>>     throws IOException, InterruptedException {
>>       // Reserve ram for the map-output
>>     .
>> .
>> .
>> .
>> 
>>       // Copy map-output into an in-memory buffer
>>       byte[] shuffleData = new byte[mapOutputLength];
>> 
>> 
>> -Prahant Kommireddi
>> 
>> On Fri, Dec 9, 2011 at 10:29 AM, Arun C Murthy <[email protected]>
>> wrote:
>> 
>>> Moving to mapreduce-user@, bcc common-user@. Please use project specific
>>> lists.
>>> 
>>> Niranjan,
>>> 
>>> If you average as 0.5G output per-map, it's 5000 maps *0.5G -> 2.5TB over
>>> 12 reduces i.e. nearly 250G per reduce - compressed!
>>> 
>>> If you think you have 4:1 compression you are doing nearly a Terabyte per
>>> reducer... which is way too high!
>>> 
>>> I'd recommend you bump to somewhere along 1000 reduces to get to 2.5G
>>> (compressed) per reducer for your job. If your compression ratio is 2:1,
>>> try 500 reduces and so on.
>>> 
>>> If you are worried about other users, use the CapacityScheduler and
>> submit
>>> your job to a queue with a small capacity and max-capacity to restrict
>> your
>>> job to 10 or 20 concurrent reduces at a given point.
>>> 
>>> Arun
>>> 
>>> On Dec 7, 2011, at 10:51 AM, Niranjan Balasubramanian wrote:
>>> 
>>>> All
>>>> 
>>>> I am encountering the following out-of-memory error during the reduce
>>> phase of a large job.
>>>> 
>>>> Map output copy failure : java.lang.OutOfMemoryError: Java heap space
>>>>      at
>>> 
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669)
>>>>      at
>>> 
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529)
>>>>      at
>>> 
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378)
>>>>      at
>>> 
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310)
>>>> I tried increasing the memory available using mapped.child.java.opts
>> but
>>> that only helps a little. The reduce task eventually fails again. Here
>> are
>>> some relevant job configuration details:
>>>> 
>>>> 1. The input to the mappers is about 2.5 TB (LZO compressed). The
>>> mappers filter out a small percentage of the input ( less than 1%).
>>>> 
>>>> 2. I am currently using 12 reducers and I can't increase this count by
>>> much to ensure availability of reduce slots for other users.
>>>> 
>>>> 3. mapred.child.java.opts --> -Xms512M -Xmx1536M -XX:+UseSerialGC
>>>> 
>>>> 4. mapred.job.shuffle.input.buffer.percent    --> 0.70
>>>> 
>>>> 5. mapred.job.shuffle.merge.percent   --> 0.66
>>>> 
>>>> 6. mapred.inmem.merge.threshold       --> 1000
>>>> 
>>>> 7. I have nearly 5000 mappers which are supposed to produce LZO
>>> compressed outputs. The logs seem to indicate that the map outputs range
>>> between 0.3G to 0.8GB.
>>>> 
>>>> Does anything here seem amiss? I'd appreciate any input of what
>> settings
>>> to try. I can try different reduced values for the input buffer percent
>> and
>>> the merge percent.  Given that the job runs for about 7-8 hours before
>>> crashing, I would like to make some informed choices if possible.
>>>> 
>>>> Thanks.
>>>> ~ Niranjan.
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Chandra Prakash Bhagtani,
> Nokia India Pvt. Ltd.

Re: OOM Error Map output copy.

Reply via email to