[
https://issues.apache.org/jira/browse/HADOOP-19673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
AMC-team updated HADOOP-19673:
------------------------------
Description:
{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom
filter vector size as:
{code:java}
int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY,
IO_MAPFILE_BLOOM_SIZE_DEFAULT);
float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY,
IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);// vectorSize = ceil( -k * n / ln(1 -
p^(1/k)) )
int vectorSize = (int) Math.ceil(
(double)(-HASH_COUNT * numKeys) /
Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
); {code}
When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
* {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer
exponent) or an invalid value;
* {{Math.log(1 - NaN)}} becomes {*}NaN{*};
* {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize ==
0{}}};
* constructing {{DynamicBloomFilter}} subsequently fails, and
{{BloomMapFile.Writer}} construction fails (observed as assertion failure in
tests).
The code misses input validation for {{io.mapfile.bloom.error.rate}} which
should be strictly within {{{}(0, 1){}}}. With invalid values, the math
silently degrades to NaN/0 and fails at runtime.
was:
{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom
filter vector size as:
{code:java}
int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY,
IO_MAPFILE_BLOOM_SIZE_DEFAULT);
float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY,
IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);// vectorSize = ceil( -k * n / ln(1 -
p^(1/k)) )
int vectorSize = (int) Math.ceil(
(double)(-HASH_COUNT * numKeys) /
Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
); {code}
When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
* {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer
exponent) or an invalid value;
* {{Math.log(1 - NaN)}} becomes {*}NaN{*};
* {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize ==
0{}}};
* constructing {{DynamicBloomFilter}} subsequently fails, and
{{BloomMapFile.Writer}} construction fails (observed as assertion failure in
tests).
The code misses input validation for {{io.mapfile.bloom.error.rate}} which
should be strictly within {{{}(0, 1){}}}. With invalid values, the math
silently degrades to NaN/0 and fails at runtime.
> BloomMapFile: invalid io.mapfile.bloom.error.rate (≤0 or ≥1) causes NaN/zero
> vector size and writer construction failure
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-19673
> URL: https://issues.apache.org/jira/browse/HADOOP-19673
> Project: Hadoop Common
> Issue Type: Bug
> Components: common, io
> Affects Versions: 2.8.5
> Reporter: AMC-team
> Priority: Major
>
> {{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom
> filter vector size as:
> {code:java}
> int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY,
> IO_MAPFILE_BLOOM_SIZE_DEFAULT);
> float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY,
> IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);// vectorSize = ceil( -k * n / ln(1 -
> p^(1/k)) )
> int vectorSize = (int) Math.ceil(
> (double)(-HASH_COUNT * numKeys) /
> Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
> ); {code}
> When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
> * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with
> non-integer exponent) or an invalid value;
> * {{Math.log(1 - NaN)}} becomes {*}NaN{*};
> * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize ==
> 0{}}};
> * constructing {{DynamicBloomFilter}} subsequently fails, and
> {{BloomMapFile.Writer}} construction fails (observed as assertion failure in
> tests).
> The code misses input validation for {{io.mapfile.bloom.error.rate}} which
> should be strictly within {{{}(0, 1){}}}. With invalid values, the math
> silently degrades to NaN/0 and fails at runtime.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]