Method `checkMaskFromIndexSize` is called by some vector masked APIs like 
`fromArray/intoArray/fromMemorySegment/...`. It is used to check whether the 
index of any active lanes in a mask will reach out of the boundary of the given 
Array/MemorySegment. This function should be force inlined, or a VectorMask 
object is generated once the function call is not inlined by C2 compiler, which 
affects the API performance a lot.

This patch changed to call the `VectorMask.checkFromIndexSize` method directly 
inside of these APIs instead of `checkMaskFromIndexSize`. Since it has added 
the `@ForceInline` annotation already, it will be inlined and intrinsified by 
C2. And then the expected vector instructions can be generated. With this 
change, the unused `checkMaskFromIndexSize` can be removed.

Performance of some JMH benchmarks can improve up to 14x on a NVIDIA Grace CPU 
(AArch64 SVE2, 128-bit vectors). We can also observe the similar performance 
improvement on a Intel CPU which supports AVX512.

Following is the performance data on Grace:


Benchmark                                             Mode  Cnt  Units     
Before      After   Gain
LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE      thrpt   30  ops/ms  
31544.304  31610.598  1.002
LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE    thrpt   30  ops/ms   
3896.202   3903.249  1.001
LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE     thrpt   30  ops/ms    
570.415   7174.320  12.57
LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE       thrpt   30  ops/ms    
566.694   7193.520  12.69
LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE      thrpt   30  ops/ms   
3899.269   3878.258  0.994
LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE     thrpt   30  ops/ms   
1134.301  16053.847  14.15
StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE    thrpt   30  ops/ms  
26449.558  28699.480  1.085
StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE  thrpt   30  ops/ms   
1922.167   5781.077  3.007
StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE   thrpt   30  ops/ms   
3784.190  11789.276  3.115
StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE     thrpt   30  ops/ms   
3694.082  15633.547  4.232
StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE    thrpt   30  ops/ms   
1966.956   6049.790  3.075
StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE   thrpt   30  ops/ms   
7647.309  27412.387  3.584

-------------

Commit messages:
 - 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined

Changes: https://git.openjdk.org/jdk/pull/23817/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23817&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8350748
  Stats: 213 lines in 7 files changed: 36 ins; 140 del; 37 mod
  Patch: https://git.openjdk.org/jdk/pull/23817.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23817/head:pull/23817

PR: https://git.openjdk.org/jdk/pull/23817

Reply via email to