[
https://issues.apache.org/jira/browse/HBASE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-28399:
------------------------------
Fix Version/s: 4.0.0-alpha-1
(was: 3.0.0-beta-2)
> region size can be wrong from RegionSizeCalculator
> --------------------------------------------------
>
> Key: HBASE-28399
> URL: https://issues.apache.org/jira/browse/HBASE-28399
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 3.0.0-beta-1
> Reporter: ruanhui
> Assignee: ruanhui
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>
> The RegionSizeCalculator calculates region byte size using the following
> method
> {code:java}
> private static final long MEGABYTE = 1024L * 1024L;
> long regionSizeBytes =
> ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE;
> {code}
> However, this method will lose accuracy. For example, the result of
> {code:java}
> ((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
> is 0. This will result in a TableInputSplit with a length of 0, but in fact
> this TableInputSplit has a small amount of data.
>
> This TableInputSplit will be ignored if we enable
> spark.hadoopRDD.ignoreEmptySplits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)