[
https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933255#comment-16933255
]
Steve Loughran commented on MAPREDUCE-7241:
-------------------------------------------
I see, that could work.
* Can you move this to a github PR? that's how we are reviewing
* we wil need to pull in some of the MR people to review this too.
> FileInputFormat listStatus causes oom when there are lots of files in HDFS
> --------------------------------------------------------------------------
>
> Key: MAPREDUCE-7241
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Affects Versions: 2.6.1
> Reporter: Zhihua Deng
> Priority: Major
> Attachments: MAPREDUCE-7241.01.patch, MAPREDUCE-7241.trunk.patch,
> filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions
> by mistakes. The file status cached when listing status could accumulate to
> over 3g. After digging into the dumped memory, the LocatedBlock occupies
> about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as
> shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,
> the datanode infos(types) or block token are not taken into account. So there
> is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
> LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
> BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
> int i = 0;
> for (BlockLocation location : blockLocations)
> { copyLocs[i++] = new BlockLocation(location); }
> return copyLocs;
> }
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]