[
https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931994#comment-16931994
]
Zhihua Deng commented on MAPREDUCE-7241:
----------------------------------------
Changing listLocatedStatus like that is hard to debug. [[email protected]],
can we do a copy here when listing status? as new BlockLocation(location) will
the remove LocatedBlock info from the _location_ instance, this seemed to be
the most easy way
> FileInputFormat listStatus causes oom when there are lots of files in HDFS
> --------------------------------------------------------------------------
>
> Key: MAPREDUCE-7241
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Affects Versions: 2.6.1
> Reporter: Zhihua Deng
> Priority: Major
> Attachments: filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions
> by mistakes. The file status cached when listing status could accumulate to
> over 3g. After digging into the dumped memory, the LocatedBlock occupies
> about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as
> shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,
> the datanode infos(types) or block token are not taken into account. So there
> is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
> LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
> BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
> int i = 0;
> for (BlockLocation location : blockLocations)
> { copyLocs[i++] = new BlockLocation(location); }
> return copyLocs;
> }
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]