[ 
https://issues.apache.org/jira/browse/HADOOP-19673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HADOOP-19673:
------------------------------
    Description: 
{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom 
filter vector size as:
{code:java}
int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY, 
IO_MAPFILE_BLOOM_SIZE_DEFAULT);
float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY, 
IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);
int vectorSize = (int) Math.ceil(
  (double)(-HASH_COUNT * numKeys) /
  Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
); {code}
When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
 * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer 
exponent) or an invalid value;
 * {{Math.log(1 - NaN)}} becomes {*}NaN{*};
 * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize == 
0{}}};
 * constructing {{DynamicBloomFilter}} subsequently fails, and 
{{BloomMapFile.Writer}} construction fails (observed as assertion failure in 
tests).

The code misses input validation for {{io.mapfile.bloom.error.rate}} which 
should be strictly within {{{}(0, 1){}}}. With invalid values, the math 
silently degrades to NaN/0 and fails at runtime.

*Reproduction*

Injected values: {{io.mapfile.bloom.error.rate = 0,-1}}

Test: {{org.apache.hadoop.io.TestBloomMapFile#testBloomMapFileConstructors}}
{code:java}
[INFO] Running org.apache.hadoop.io.TestBloomMapFile
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.358 s 
<<< FAILURE! - in org.apache.hadoop.io.TestBloomMapFile
[ERROR] org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors  
Time elapsed: 0.272 s  <<< FAILURE!
java.lang.AssertionError: testBloomMapFileConstructors error !!!
        at 
org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors(TestBloomMapFile.java:287{code}

  was:
{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom 
filter vector size as:
{code:java}
int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY, 
IO_MAPFILE_BLOOM_SIZE_DEFAULT);
float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY, 
IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);
int vectorSize = (int) Math.ceil(
  (double)(-HASH_COUNT * numKeys) /
  Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
); {code}
When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
 * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer 
exponent) or an invalid value;
 * {{Math.log(1 - NaN)}} becomes {*}NaN{*};
 * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize == 
0{}}};
 * constructing {{DynamicBloomFilter}} subsequently fails, and 
{{BloomMapFile.Writer}} construction fails (observed as assertion failure in 
tests).

The code misses input validation for {{io.mapfile.bloom.error.rate}} which 
should be strictly within {{{}(0, 1){}}}. With invalid values, the math 
silently degrades to NaN/0 and fails at runtime.

*Reproduction*

Injected values: {{{}io.mapfile.bloom.error.rate = -1{}}}, {{-0.005}}

Test: {{org.apache.hadoop.io.TestBloomMapFile#testBloomMapFileConstructors}}
{code:java}
[INFO] Running org.apache.hadoop.io.TestBloomMapFile
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.358 s 
<<< FAILURE! - in org.apache.hadoop.io.TestBloomMapFile
[ERROR] org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors  
Time elapsed: 0.272 s  <<< FAILURE!
java.lang.AssertionError: testBloomMapFileConstructors error !!!
        at 
org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors(TestBloomMapFile.java:287{code}


> BloomMapFile: invalid io.mapfile.bloom.error.rate (≤0 or ≥1) causes NaN/zero 
> vector size and writer construction failure
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19673
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19673
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, io
>    Affects Versions: 2.8.5
>            Reporter: AMC-team
>            Priority: Major
>
> {{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom 
> filter vector size as:
> {code:java}
> int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY, 
> IO_MAPFILE_BLOOM_SIZE_DEFAULT);
> float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY, 
> IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);
> int vectorSize = (int) Math.ceil(
>   (double)(-HASH_COUNT * numKeys) /
>   Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
> ); {code}
> When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
>  * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with 
> non-integer exponent) or an invalid value;
>  * {{Math.log(1 - NaN)}} becomes {*}NaN{*};
>  * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize == 
> 0{}}};
>  * constructing {{DynamicBloomFilter}} subsequently fails, and 
> {{BloomMapFile.Writer}} construction fails (observed as assertion failure in 
> tests).
> The code misses input validation for {{io.mapfile.bloom.error.rate}} which 
> should be strictly within {{{}(0, 1){}}}. With invalid values, the math 
> silently degrades to NaN/0 and fails at runtime.
> *Reproduction*
> Injected values: {{io.mapfile.bloom.error.rate = 0,-1}}
> Test: {{org.apache.hadoop.io.TestBloomMapFile#testBloomMapFileConstructors}}
> {code:java}
> [INFO] Running org.apache.hadoop.io.TestBloomMapFile
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.358 
> s <<< FAILURE! - in org.apache.hadoop.io.TestBloomMapFile
> [ERROR] org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors  
> Time elapsed: 0.272 s  <<< FAILURE!
> java.lang.AssertionError: testBloomMapFileConstructors error !!!
>         at 
> org.apache.hadoop.io.TestBloomMapFile.testBloomMapFileConstructors(TestBloomMapFile.java:287{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to