Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-17 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains two additional commits since the last 
revision:

 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
   => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
   => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   ```
   cond can be eq or ne.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE2:
   With option `-XX:UseSVE=2`:
   ```
   BenchmarkUnitBefore  Score Error After   
Score Error Uplift
   testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
10266136.26 8955.008548 1.29
   testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
1179760.772 448.031844  

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 09:17:58 GMT, Emanuel Peter  wrote:

> > > Just a drive-by comment for now, I may review this later more fully.
> > > > I would also prefer if you added the IR restrictions rather than the 
> > > > JTREG requires.
> > > > The benefit is that we can still run the tests on all platforms, at 
> > > > least for result verification.
> > > > Imagine someone adds optimizations to a new platform, but does not know 
> > > > about this test here. They make a mistake, and there is a bug, leading 
> > > > either to a crash or wrong result. With the requires, you test would 
> > > > never even run, and we would not catch it. With the IR applyIf, we 
> > > > would catch the bug.
> > > 
> > > 
> > > Just copy pasting the IR applyIf everywhere is not that much work, and 
> > > adding in a new platform later is not really hard either.
> > 
> > 
> > Thanks! The problem is that when a new platform is added, people may not 
> > even know there is a test.
> 
> @erifan That is true. But we have that problem either way. If you use 
> `@require`, then the person does not realize there is a test AND the test is 
> not run. If you use `applyIf`, the person does not realize there is a test, 
> but it is run at least for result verifiation - and then the person MIGHT 
> realize if the test catches a wrong result / crash.

This test will run on new platforms when we use @requires. I explained the 
meaning of the @requires in the previous comment, it only excludes one case: 
when -XX:UseAVX=0 is specified on x86 platforms.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834650280


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 14:10:40 GMT, Emanuel Peter  wrote:

>> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237:
>> 
>>> 235: public static void testCompareEQMaskNotByte() {
>>> 236: testCompareMaskNotByte(VectorOperators.EQ);
>>> 237: }
>> 
>> Another comment: you now only have "negative" tests, where you check for 
>> count `=0`. It would be good if you also had a positive rule here, one where 
>> you do see an XOR in a similar case, where your optimization does apply.
>> 
>> This would basically be a "control" that checks that your are testing the 
>> right thing.
>
> Also: this test should vectorize on some plarforms, right? A compare, 
> correct? Would it not be good to actively check that with an IR rule?

> Another comment: you now only have "negative" tests, where you check for 
> count =0. It would be good if you also had a positive rule here, one where 
> you do see an XOR in a similar case, where your optimization does apply.
This would basically be a "control" that checks that your are testing the right 
thing.

This is not a negative test, this is a positive test. What this patch does is:

`Vector compare(NE) + not() => vector compare(EQ)`. The `not()` operation is 
removed. For details, please see the commit message and related code. So here 
we check XorV and XorVMask == 0, which I think is reasonable.

For negative tests, I think it's not necessary, because I only test what I 
optimized, and I won't write a test to say what optimization cannot be done.

> Also: this test should vectorize on some plarforms, right? A compare, 
> correct? Would it not be good to actively check that with an IR rule?

The `compare` operation is not eliminated, the patch aims to eliminate the 
`not` operation following `compare`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065195738


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-27 Thread erifan
On Fri, 25 Apr 2025 09:48:59 GMT, Jatin Bhateja  wrote:

>> Since this is a platform independent optimization, I tend to use this 
>> `@requires` because it's simpler.  If we use `applyIfCPUFeatureOr`, we need 
>> to add the same restriction before each test. In addition, if a new 
>> architecture supports the vector node, this test may not cover it. What do 
>> you think?
>
> I don't see XorVMask implemented on all non-x86 target, like PPC etc..

This is not specifically required on x86, but because this test fails on x86 
when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the 
sub-graph is like this:
`(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))`
It is not an optimization pattern supported by this patch because we don't know 
what's the comparison op.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2062583790


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-29 Thread erifan
On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter  wrote:

> Yes, this discussion is down to `requires` vs `applyIf`. This is my argument 
> for `applyIf`, quoted from above, I have not yet seen an argument against it:
> 
> > If you use @require, then the person does not realize there is a test AND 
> > the test is not run. If you use applyIf, the person does not realize there 
> > is a test, but it is run at least for result verifiation - and then the 
> > person MIGHT realize if the test catches a wrong result / crash.
> 
> In my understanding, `requires` should only be used if the test really 
> **requires** a certain platform or feature. That can be because some flags 
> are only available under certain platforms for example. But for IR tests, we 
> should try to always use `applyIf`, because it allows testing on other 
> platforms.
> 
> Actually, I filed this RFE a while ago: 
> https://bugs.openjdk.org/browse/JDK-8310891 We should try to move as many 
> tests from using `requires` to `applyIf`, so that we have an increased test 
> coverage.

I see, I'll update the code. Thanks~

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2840582699


RFR: 8354242: VectorAPI: combine vector not operation with compare

2025-04-15 Thread erifan
This patch optimizes the following patterns:
For integer types:

(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
=> (VectorMaskCmp src1 src2 ncond)
(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
=> (VectorMaskCmp src1 src2 ncond)

cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
negative comparison of cond.

For float and double types:

(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))

cond can be eq or ne.

Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
`-XX:UseSVE=2`:

Benchmark   UnitBefore  Score Error After   
Score Error Uplift
testCompareEQMaskNotByteops/s   7912127.225 2677.289518 
10266136.26 8955.008548 1.29
testCompareEQMaskNotDouble  ops/s   884737.6799 446.963779  
1179760.772 448.031844  1.33
testCompareEQMaskNotFloat   ops/s   1765045.787 682.332214  
2359520.803 896.305743  1.33
testCompareEQMaskNotInt ops/s   1787221.411 977.743935  
2353952.519 960.069976  1.31
testCompareEQMaskNotLongops/s   895297.1974 673.44808   
1178449.02  323.804205  1.31
testCompareEQMaskNotShort   ops/s   3339987.002 3415.2226   
4712761.965 2110.862053 1.41
testCompareGEMaskNotByteops/s   7907615.16  4094.243652 
10251646.9  9486.699831 1.29
testCompareGEMaskNotInt ops/s   1683738.958 4233.813092 
2352855.205 1251.952546 1.39
testCompareGEMaskNotLongops/s   854496.1561 8594.598885 
1177811.493 521.12291.37
testCompareGEMaskNotShort   ops/s   3341860.309 1578.975338 
4714008.434 1681.10365  1.41
testCompareGTMaskNotByteops/s   7910823.674 2993.367032 
10245063.58 9774.75138  1.29
testCompareGTMaskNotInt ops/s   1673393.928 3153.099431 
2353654.521 1190.848583 1.4
testCompareGTMaskNotLongops/s   849405.9159 2432.858159 
1177952.041 359.96413   1.38
testCompareGTMaskNotShort   ops/s   3339509.141 3339.976585 
4711442.496 2673.364893 1.41
testCompareLEMaskNotByteops/s   7911340.004 3114.69191  
10231626.5  27134.20035 1.29
testCompareLEMaskNotInt ops/s   1675812.113 1340.969885 
2353255.341 1452.4522   1.4
testCompareLEMaskNotLongops/s   848862.8036 6564.841731 
1177763.623 539.290106  1.38
testCompareLEMaskNotShort   ops/s   3324951.54  2380.29473  
4712116.251 1544.559684 1.41
testCompareLTMaskNotByteops/s   7910390.844 2630.861436 
10239567.69 6487.441672 1.29
testCompareLTMaskNotInt ops/s   1672180.09  995.238142  
2353757.863 853.774734  1.4
testCompareLTMaskNotLongops/s   856502.2695 12276.82851 
1177671.815 496.723302  1.37
testCompareLTMaskNotShort   ops/s   3325798.025 2412.702501 
4711554.181 1779.302112 1.41
testCompareNEMaskNotByteops/s   7910002.518 2771.82477  
10245315.33 16321.93935 1.29
testCompareNEMaskNotDouble  ops/s   863754.6022 523.140788  
1179133.982 476.572178  1.36
testCompareNEMaskNotFloat   ops/s   1723321.883 2598.484803 
2358492.186 877.14011.36
testCompareNEMaskNotInt ops/s   1670288.841 751.774826  
2354158.125 835.720163  1.4
testCompareNEMaskNotLongops/s   836327.6835 410.525466  
1178178.825 308.757932  1.4
testCompareNEMaskNotShort   ops/s   3327815.841 1511.978763 
4711379.136 2336.505531 1.41
testCompareUGEMaskNotByte   ops/s   7906699.024 3200.936474 
10253843.74 15067.59401 1.29
testCompareUGEMaskNotIntops/s   1674003.923 3287.191727 
2353340.666 951.381021  1.4
testCompareUGEMaskNotLong   ops/s   852424.5562 8920.408939 
1177943.609 389.66211.38
testCompareUGEMaskNotShort  ops/s   3327255.858 1584.885143 
4711622.355 1247.215277 1.41
testCompareUGTMaskNotByte   ops/s   7909249.189 4435.283667 
10245541.34 10993.34739 1.29
testCompareUGTMaskNotIntops/s   1693713.433 20650.00213 
2353153.787 1055.343846 1.38
testCompareUGTMaskNotLong   ops/s   851022.3395 7079.065268 
1177910.677 538.604598  1.38
testCompareUGTMaskNotShort  ops/s   3327236.988 1616.886789 
4711209.865 3098.494145 1.41
testCompareULEMaskNotByte   ops/s   7909350.825 3251.262342 
10261449.03 7273.831341 1.29
testCompareULEMaskNotIntops/s   1672350.925 1545.304304 
2353

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-05-01 Thread erifan
On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains four additional commits since 
>> the last revision:
>> 
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>>testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>>testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>>testCompareGTMaskNotInt   ops/s   1673393...
>
> Yes, this discussion is down to `requires` vs `applyIf`. This is my argument 
> for `applyIf`, quoted from above, I have not yet seen an argument against it:
> 
>>  If you use @require, then the person does not realize there is a test AND 
>> the test is not run. If you use applyIf, the person does not realize there 
>> is a test, but it is run at least for result verifiation - and then the 
>> person MIGHT realize if the test catches a wrong result / crash.
> 
> In my understanding, `requires` should only be used if the test really 
> **requires** a certain platform or feature. That can be because some flags 
> are only available under certain platforms for example. But for IR tests, we 
> should try to always use `applyIf`, because it allows testing on other 
> platforms.
> 
> Actually, I filed this RFE a while ago: 
> https://bugs.openjdk.org/browse/JDK-8310891
> We should try to move as many tests from using `requires` to `applyIf`, so 
> that we have an increased test coverage.

@eme64 @jatin-bhateja I have updated the test, thanks for your suggestion.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2844256626


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]

2025-05-01 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains six additional commits since the last 
revision:

 - Update the jtreg test
 - Merge branch 'master' into JDK-8354242
 - Addressed some review comments
   
   1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
   2. Improve code comments.
 - Merge branch 'master' into JDK-8354242
 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
   => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
   => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   ```
   cond can be eq or ne.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE2:
   With option `-XX:UseSVE=2`:
   ```
   BenchmarkUnitBefore  Score Error After  

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-24 Thread erifan
On Wed, 23 Apr 2025 12:09:51 GMT, Jatin Bhateja  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains two additional commits since 
>> the last revision:
>> 
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>>testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>>testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>>testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>>testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>>testCompareGTMaskNotShort ops/s   3339509.141 ...
>
> src/hotspot/share/opto/vectornode.cpp line 2234:
> 
>> 2232:   // XorV/XorVMask is commutative, swap 
>> VectorMaskCmp/Op_VectorMaskCast to in1.
>> 2233:   if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != 
>> Op_VectorMaskCast) {
>> 2234: swap(in1, in2);
> 
> Swapping inputs like this without refreshing GVN bookkeeping is not safe. I 
> guess you wanted to use Node::swap_edges.

The edges are not swapped, but two variables in1 and in2

> src/hotspot/share/opto/vectornode.cpp line 2243:
> 
>> 2241: in1 = in1->in(1);
>> 2242:   }
>> 2243:   if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 ||
> 
> Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for 
> a specific graph palette and replaces it with a new node whose inputs are the 
> same as the inputs of the palette. GVN will do the retention job if any 
> intermediate node has users beyond the pattern being replaced.

Thanks for telling me this information. Another more important reason to check 
outcnt here is to prevent this optimization when the uses of VectorMaskCmp is 
greater than 1, because this optimization may not be worthwhile. For example:


  public static void testVectorMaskCmp() {
IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0);
IntVector av = IntVector.fromArray(I_SPECIES, ia, 0);
VectorMask m1 = av.compare(VectorOperators.NE, bv);  // two uses
VectorMask m2 =m1.not();
m1.intoArray(m, 0);
av.lanewise(VectorOper

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-24 Thread erifan
On Fri, 18 Apr 2025 01:36:10 GMT, erifan  wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>> => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>> => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> BenchmarkUnitBefore  Score Error After   
>> Score Error Uplift
>> testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>> testCompareEQMaskNotFloatops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>> testCompareEQMaskNotInt  ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>> testCompareEQMaskNotLong ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>> testCompareEQMaskNotShortops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>> testCompareGEMaskNotByte ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>> testCompareGEMaskNotInt  ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>> testCompareGEMaskNotLong ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>> testCompareGEMaskNotShortops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>> testCompareGTMaskNotByte ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>> testCompareGTMaskNotInt  ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>> testCompareGTMaskNotLong ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>> testCompareGTMaskNotShortops/s   3339509.141 3339.976585 
>> 4711442.496 2673.364893 1.41
>> testCompareLEMaskNotByte ops/s   7911340.004 3114.69191  
>> 10231626.5  27134.20035 1.29
>> testCompareLEMaskNotInt  ops/s   1675812.113 1340.969885 
>> 2353255.341 1452.4522   1.4
>> testCompareLEMaskNotLong ops/s   848862.8036 6564.841731 
>> 1177763.623 539.290106  1.38
>> testCompareLEMaskNotShortops/s   3324951.54  2380.29473  
>> 4712116.251 1544.559684 1.41
>> testCompareLTMaskNotByte ops/s   7910390.844 2630.861436 
>> 10239567.69 6487.441672 1.29
>> testCompareLTMaskNotInt  ops/s   16721...
>
> erifan has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains two additional commits since the 
> last revision:
> 
>  - Merge branch 'master' into JDK-8354242
>  - 8354242: VectorAPI: combine vector not operation with compare
>
>This patch optimizes the following patterns:
>For integer types:
>```
>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>=> (VectorMaskCmp src1 src2 ncond)
>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>=> (VectorMaskCmp src1 src2 ncond)
>```
>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>negative comparison of cond.
>
>For float and double types:
>```
>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>```
>cond can be eq or ne.
>
>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>With option `-XX:UseSVE=2`:
>```
>Be

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-25 Thread erifan
On Thu, 24 Apr 2025 10:28:15 GMT, Andrew Haley  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains two additional commits since 
>> the last revision:
>> 
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>>testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>>testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>>testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>>testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>>testCompareGTMaskNotShort ops/s   3339509.141 ...
>
> src/hotspot/share/opto/node.cpp line 1226:
> 
>> 1224: // be added to the IGVN worklist, then the optimization will not 
>> be applied.
>> 1225: // Therefore, add this node into IGVN worklist to make the 
>> optimization happen.
>> 1226: return true;
> 
> Suggestion:
> 
>   } else if (n->Opcode() == Op_XorV || n->Opcode() == Op_XorVMask) {
> // Condition for removing an unnecessary not() following
> // a compare(...) operation.
> // The predecessor of n (this XorV or XorVMask) may also be used
> // by a useless VectorBox node which will later be eliminated by
> // RemoveUseless. Return true to ensure that subgraph
> // transformations are performed on n.
> return true;

Done. Thanks for your review.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059706930


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-25 Thread erifan
On Thu, 24 Apr 2025 09:39:58 GMT, erifan  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2265:
>> 
>>> 2263: vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), 
>>> vmcast_vt);
>>> 2264:   }
>>> 2265:   return vmcmp;
>> 
>> It would be preferable if you could kindly re-factor the code such that we 
>> only call VectorNode::Ideal once at return to comply with aesthetics of 
>> other idealization routines.
>
> Ok, I'll change it in the next commit.

Done, thanks.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059702378


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-25 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains four additional commits since the last 
revision:

 - Addressed some review comments
   
   1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
   2. Improve code comments.
 - Merge branch 'master' into JDK-8354242
 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
   => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
   => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   ```
   cond can be eq or ne.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE2:
   With option `-XX:UseSVE=2`:
   ```
   BenchmarkUnitBefore  Score Error After   
Score Error Uplift
   testCompareEQMaskNotByte ops/s   7912

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 06:45:58 GMT, Emanuel Peter  wrote:

>> This is not specifically required on x86, but because this test fails on x86 
>> when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the 
>> sub-graph is like this:
>> `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))`
>> It is not an optimization pattern supported by this patch because we don't 
>> know what's the comparison op.
>
> I would also prefer if you added the IR restrictions rather than the JTREG 
> requires.
> The benefit is that we can still run the tests on all platforms, at least for 
> result verification.
> 
> Imagine someone adds optimizations to a new platform, but does not know about 
> this test here. They make a mistake, and there is a bug, leading either to a 
> crash or wrong result. With the requires, you test would never even run, and 
> we would not catch it. With the IR applyIf, we would catch the bug.

I can make the change, it's not complex, but it is different from what I 
thought before.

I thought that supporting vector was the default behavior, is it right? So when 
I was doing an architecture-independent feature or optimization, I should just 
exclude those unsupported cases from the test, so that all potential 
environments would be tested. If I was doing an architecture- or 
feature-dependent optimization, then I should limit the test to run only in 
supported environments.

For this case, **the current meaning of @requires is "skip this test when 
-XX:UseAVX=0 is specified on the x86 architecture, otherwise run the tests".** 
So if a new architecture (say s390) supports related vector operations in the 
future, then this test will be run on that platform by default.

If all architecture-independent tests are restricted with applyIfCPUFeatureOr, 
then when we support a new architecture, we will need to modify all tests, 
otherwise no test will run on this architecture.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063076640


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 06:47:38 GMT, Emanuel Peter  wrote:

> Just a drive-by comment for now, I may review this later more fully.
> 
> > I would also prefer if you added the IR restrictions rather than the JTREG 
> > requires.
> > The benefit is that we can still run the tests on all platforms, at least 
> > for result verification.
> > Imagine someone adds optimizations to a new platform, but does not know 
> > about this test here. They make a mistake, and there is a bug, leading 
> > either to a crash or wrong result. With the requires, you test would never 
> > even run, and we would not catch it. With the IR applyIf, we would catch 
> > the bug.
> 
> Just copy pasting the IR applyIf everywhere is not that much work, and adding 
> in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even 
know there is a test.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834280547


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 14:12:02 GMT, Emanuel Peter  wrote:

> I suppose in that case you can assert that you NEVER get those nodes, because 
> if you have vectors not supported, they will not show up because of that, and 
> if you do support vectors, they should be optimized away.

This is expected. 

- If vectors are supported, the test checks that Vector not() is optimized away.

- If vectors are not supported, the vector IRs won't generated, so the IR check 
will pass. And the correctness verification also runs, this is used to ensure 
that the patch does not break correctness. In the future if vectors are 
supported, the test also runs without any modification.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065211944


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-04-28 Thread erifan
On Mon, 28 Apr 2025 14:13:43 GMT, Emanuel Peter  wrote:

> > > > > Just a drive-by comment for now, I may review this later more fully.
> > > > > > I would also prefer if you added the IR restrictions rather than 
> > > > > > the JTREG requires.
> > > > > > The benefit is that we can still run the tests on all platforms, at 
> > > > > > least for result verification.
> > > > > > Imagine someone adds optimizations to a new platform, but does not 
> > > > > > know about this test here. They make a mistake, and there is a bug, 
> > > > > > leading either to a crash or wrong result. With the requires, you 
> > > > > > test would never even run, and we would not catch it. With the IR 
> > > > > > applyIf, we would catch the bug.
> > > > > 
> > > > > 
> > > > > Just copy pasting the IR applyIf everywhere is not that much work, 
> > > > > and adding in a new platform later is not really hard either.
> > > > 
> > > > 
> > > > Thanks! The problem is that when a new platform is added, people may 
> > > > not even know there is a test.
> > > 
> > > 
> > > @erifan That is true. But we have that problem either way. If you use 
> > > `@require`, then the person does not realize there is a test AND the test 
> > > is not run. If you use `applyIf`, the person does not realize there is a 
> > > test, but it is run at least for result verifiation - and then the person 
> > > MIGHT realize if the test catches a wrong result / crash.
> > 
> > 
> > This test will run on new platforms when we use @requires. I explained the 
> > meaning of the @requires in the previous comment, it only excludes one 
> > case: when -XX:UseAVX=0 is specified on x86 platforms.
> 
> I see. You should probably add a comment there, to say that you are only 
> excluding `AVX=0`. But even `UseAVX = 0` would profit from result 
> verification.

@requires is a special comment itself. I feel like it's a bit weird to add a 
comment to a comment, and I don't think the @requires is hard to understand.

If we want to verify the correctness of AVX=0, we have to use ApplyIf. This is 
back to the beginning of the question, should we use @requires or ApplyIf? 
Personally I tend to use the former. By the way, I have tested the correctness 
of AVX=0 locally.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2837292655


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-07 Thread erifan
On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains eight additional commits 
>> since the last revision:
>> 
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.156...
>
> src/hotspot/share/opto/vectornode.cpp line 2231:
> 
>> 2229:   }
>> 2230:   if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 ||
>> 2231:   !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() ||
> 
> Do you plan  to extend your testcase /  matching logic to cover following 
> equivalent patterns:
> 
> - compare.xor(maskAll(true))
> - compare.xor(VectorMask.fromLong(SPECIES, -1L))

Hi @jatin-bhateja It is feasible. But I was thinking about whether another 
solution would be better, which is to turn  `VectorMask.fromLong(SPECIES, -1L)` 
into `MaskAll(true)` in the mid-end. In this way, we don't need to check this 
pattern in this optimization. What do you think ?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077316033


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-09 Thread erifan
On Wed, 7 May 2025 02:10:56 GMT, erifan  wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>> => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>> => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> BenchmarkUnitBefore  Score Error After   
>> Score Error Uplift
>> testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>> testCompareEQMaskNotFloatops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>> testCompareEQMaskNotInt  ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>> testCompareEQMaskNotLong ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>> testCompareEQMaskNotShortops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>> testCompareGEMaskNotByte ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>> testCompareGEMaskNotInt  ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>> testCompareGEMaskNotLong ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>> testCompareGEMaskNotShortops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>> testCompareGTMaskNotByte ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>> testCompareGTMaskNotInt  ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>> testCompareGTMaskNotLong ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>> testCompareGTMaskNotShortops/s   3339509.141 3339.976585 
>> 4711442.496 2673.364893 1.41
>> testCompareLEMaskNotByte ops/s   7911340.004 3114.69191  
>> 10231626.5  27134.20035 1.29
>> testCompareLEMaskNotInt  ops/s   1675812.113 1340.969885 
>> 2353255.341 1452.4522   1.4
>> testCompareLEMaskNotLong ops/s   848862.8036 6564.841731 
>> 1177763.623 539.290106  1.38
>> testCompareLEMaskNotShortops/s   3324951.54  2380.29473  
>> 4712116.251 1544.559684 1.41
>> testCompareLTMaskNotByte ops/s   7910390.844 2630.861436 
>> 10239567.69 6487.441672 1.29
>> testCompareLTMaskNotInt  ops/s   16721...
>
> erifan has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains eight additional commits since 
> the last revision:
> 
>  - Refactor code
>
>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>optimization, making the code more modular.
>  - Merge branch 'master' into JDK-8354242
>  - Update the jtreg test
>  - Merge branch 'master' into JDK-8354242
>  - Addressed some review comments
>
>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>2. Improve code comments.
>  - Merge branch 'master' into JDK-8354242
>  - Merge branch 'master' into JDK-8354242
>  - 8354242: VectorAPI: combine vector not operation with compare
>
>This patch optimizes the following patterns:
>For integer types:
>```
>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>=> (VectorMaskCmp src1 src2 ncond)
>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>=> (VectorMaskCmp src1 src2 ncond)
>```
>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>negative comparison of cond.
>   

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-09 Thread erifan
On Thu, 8 May 2025 01:49:45 GMT, Xiaohong Gong  wrote:

>> Yes, that's the right approach. For this PR, I think you can mix some test 
>> points covering compare, xor(maskAll(true)).
>
> Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be 
> better, and that will benefit AArch64 as well, since `MaskAll()` is much more 
> cheaper than `fromLong()` on AArch64. We can add such a transformation with 
> another PR.

Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2081313613


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-07 Thread erifan
On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains eight additional commits 
>> since the last revision:
>> 
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.156...
>
> src/hotspot/share/opto/vectornode.cpp line 2231:
> 
>> 2229:   }
>> 2230:   if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 ||
>> 2231:   !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() ||
> 
> Do you plan  to extend your testcase /  matching logic to cover following 
> equivalent patterns:
> 
> - compare.xor(maskAll(true))
> - compare.xor(VectorMask.fromLong(SPECIES, -1L))

Oh, I didn't think of this case, let me try. Thanks~

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077026150


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-06 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains eight additional commits since the last 
revision:

 - Refactor code
   
   Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
   optimization, making the code more modular.
 - Merge branch 'master' into JDK-8354242
 - Update the jtreg test
 - Merge branch 'master' into JDK-8354242
 - Addressed some review comments
   
   1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
   2. Improve code comments.
 - Merge branch 'master' into JDK-8354242
 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
   => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
   => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
   => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
   ```
   cond can

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]

2025-05-06 Thread erifan
On Fri, 2 May 2025 06:16:03 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains six additional commits since 
>> the last revision:
>> 
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>>testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>>testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 1...
>
> src/hotspot/share/opto/vectornode.cpp line 2224:
> 
>> :   //=> (VectorMaskCmp src1 src2 ncond)
>> 2223:   // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond 
>> is the
>> 2224:   // negative comparison of cond.
> 
> Suggestion:
> 
>   // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt.
>   // ncond is the negative comparison of cond.
> 
> I was getting lost in all the commas.

Done.

> src/hotspot/share/opto/vectornode.cpp line 2248:
> 
>> 2246:   !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() ||
>> 2247:   !VectorNode::is_all_ones_vector(in2)) {
>> 2248: return VectorNode::Ideal(phase, can_reshape);
> 
> Hmm, so this is really the "else" case, if your optimization does not 
> succeed, right?
> 
> Wrapping `VectorNode::Ideal` somewhere in the middle is going to make future 
> optimizations here much harder.
> How would they check their conditions next to yours? That would be quite a 
> mess.
> 
> I suggest you do this:
> - `XorVNode::Ideal` does
>   - checks `in1 == in2` case
>   - calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it 
> succeeded, i.e. returns `nullptr`.
>   - Finally, i.e. none of the optimizations above worked: call 
> `VectorNode::Ideal`
> 
> Then you pack all your new logic here into 
> `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is 
> just what I came up with just now.
> 
> This gives us a much more **modular** design, and it is easier to add another 
> new optimization to `XorVNode::Ideal`. It is easy to change the precedence of 
> the optimizations by just changing the order, etc.

Done, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646028
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646366


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]

2025-05-06 Thread erifan
On Thu, 1 May 2025 07:32:22 GMT, erifan  wrote:

>> Yes, this discussion is down to `requires` vs `applyIf`. This is my argument 
>> for `applyIf`, quoted from above, I have not yet seen an argument against it:
>> 
>>>  If you use @require, then the person does not realize there is a test AND 
>>> the test is not run. If you use applyIf, the person does not realize there 
>>> is a test, but it is run at least for result verifiation - and then the 
>>> person MIGHT realize if the test catches a wrong result / crash.
>> 
>> In my understanding, `requires` should only be used if the test really 
>> **requires** a certain platform or feature. That can be because some flags 
>> are only available under certain platforms for example. But for IR tests, we 
>> should try to always use `applyIf`, because it allows testing on other 
>> platforms.
>> 
>> Actually, I filed this RFE a while ago: 
>> https://bugs.openjdk.org/browse/JDK-8310891
>> We should try to move as many tests from using `requires` to `applyIf`, so 
>> that we have an increased test coverage.
>
> @eme64 @jatin-bhateja I have updated the test, thanks for your suggestion.

> @erifan thanks for updating the tests!
> 
> Now I had a quick look at the VM code.
> 
> My biggest observation is this:
> 
> Wrapping `VectorNode::Ideal` somewhere in the middle of your new optimization 
> is going to make future optimizations here much harder. How would they check 
> their conditions next to yours? That would be quite a mess.
> 
> I suggest you do this:
> 
> * `XorVNode::Ideal` does
>   
>   * checks `in1 == in2` case
>   * calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it 
> succeeded, i.e. returns `nullptr`.
>   * ... future optimizations could go here ...
>   * Finally, i.e. none of the optimizations above worked: call 
> `VectorNode::Ideal`
> 
> Then you pack all your new logic here into 
> `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is 
> just what I came up with just now.
> 
> This gives us a much more **modular** design, and it is easier to add another 
> new optimization to `XorVNode::Ideal`. It is easy to change the precedence of 
> the optimizations by just changing the order, etc.
> 
> Examples of this "modular" design:
> 
> * `CMoveNode::Ideal` -> calls `TypeNode::Ideal` and `Ideal_minmax`.
> * `StoreBNode::Ideal` -> calls `StoreNode::Ideal_masked_input` and 
> `StoreNode::Ideal_sign_extended_input`
>   These are really nice, because you can quickly see what optimizations we 
> already have, and in which order they are checked.

Yes, this is a good idea, I have changed it like this, thanks for your 
suggestion.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2856811448


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]

2025-05-06 Thread erifan
On Fri, 2 May 2025 06:14:33 GMT, Emanuel Peter  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2216:
>> 
>>> 2214:   in2->is_predicated_vector()) {
>>> 2215: with_predicated = true;
>>> 2216:   }
>> 
>> Suggestion:
>> 
>>   bool with_predicated = is_predicated_vector() ||
>>  in1->is_predicated_vector() ||
>>  in2->is_predicated_vector();
>
> Would that not be easier to read?

Yes, done, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076645802


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-05-13 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains 10 additional commits since the last 
revision:

 - Refactor the JTReg tests for compare.xor(maskAll)
   
   Also made a bit change to support pattern `VectorMask.fromLong()`.
 - Merge branch 'master' into JDK-8354242
 - Refactor code
   
   Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
   optimization, making the code more modular.
 - Merge branch 'master' into JDK-8354242
 - Update the jtreg test
 - Merge branch 'master' into JDK-8354242
 - Addressed some review comments
   
   1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
   2. Improve code comments.
 - Merge branch 'master' into JDK-8354242
 - Merge branch 'master' into JDK-8354242
 - 8354242: VectorAPI: combine vector not operation with compare
   
   This patch optimizes the following patterns:
   For integer types:
   ```
   (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
   => (VectorMaskCmp src1 src2 ncond)
   (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
   => (VectorMaskCmp src1 src2 ncond)
   ```
   cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
   negative comparison of cond.
   
   For float and double types:
   ```
   (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
   => (VectorMaskCast (VectorMaskCmp sr

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-13 Thread erifan
On Wed, 7 May 2025 11:13:23 GMT, Jatin Bhateja  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains eight additional commits 
>> since the last revision:
>> 
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>>testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>>testCompareGEMaskNotLong  ops/s   854496.156...
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
>  line 40:
> 
>> 38: @Fork(jvmArgs = { "--add-modules=jdk.incubator.vector" })
>> 39: public class MaskCompareNotBenchmark {
>> 40: private static final int ARRAYLEN = 4096;
> 
> ARRAYLEN should be configurable @Param.

Done.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927310


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]

2025-05-13 Thread erifan
On Fri, 9 May 2025 09:37:22 GMT, erifan  wrote:

>> Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be 
>> better, and that will benefit AArch64 as well, since `MaskAll()` is much 
>> more cheaper than `fromLong()` on AArch64. We can add such a transformation 
>> with another PR.
>
> Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks!

Done, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927027


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-05-16 Thread erifan
On Wed, 14 May 2025 02:44:14 GMT, erifan  wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>> => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>> => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> BenchmarkUnitBefore  Score Error After   
>> Score Error Uplift
>> testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>> testCompareEQMaskNotFloatops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>> testCompareEQMaskNotInt  ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>> testCompareEQMaskNotLong ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>> testCompareEQMaskNotShortops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>> testCompareGEMaskNotByte ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>> testCompareGEMaskNotInt  ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>> testCompareGEMaskNotLong ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>> testCompareGEMaskNotShortops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>> testCompareGTMaskNotByte ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>> testCompareGTMaskNotInt  ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>> testCompareGTMaskNotLong ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>> testCompareGTMaskNotShortops/s   3339509.141 3339.976585 
>> 4711442.496 2673.364893 1.41
>> testCompareLEMaskNotByte ops/s   7911340.004 3114.69191  
>> 10231626.5  27134.20035 1.29
>> testCompareLEMaskNotInt  ops/s   1675812.113 1340.969885 
>> 2353255.341 1452.4522   1.4
>> testCompareLEMaskNotLong ops/s   848862.8036 6564.841731 
>> 1177763.623 539.290106  1.38
>> testCompareLEMaskNotShortops/s   3324951.54  2380.29473  
>> 4712116.251 1544.559684 1.41
>> testCompareLTMaskNotByte ops/s   7910390.844 2630.861436 
>> 10239567.69 6487.441672 1.29
>> testCompareLTMaskNotInt  ops/s   16721...
>
> erifan has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains 10 additional commits since the 
> last revision:
> 
>  - Refactor the JTReg tests for compare.xor(maskAll)
>
>Also made a bit change to support pattern `VectorMask.fromLong()`.
>  - Merge branch 'master' into JDK-8354242
>  - Refactor code
>
>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>optimization, making the code more modular.
>  - Merge branch 'master' into JDK-8354242
>  - Update the jtreg test
>  - Merge branch 'master' into JDK-8354242
>  - Addressed some review comments
>
>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>2. Improve code comments.
>  - Merge branch 'master' into JDK-8354242
>  - Merge branch 'master' into JDK-8354242
>  - 8354242: VectorAPI: combine vector not operation with compare
>
>This patch optimizes the following patterns:
>For integer types:
>```
>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>=> (VectorMaskCmp src1 src2 ncond)
>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
&

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-05-29 Thread erifan
On Wed, 28 May 2025 12:12:50 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 10 additional commits since 
>> the last revision:
>> 
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4...
>
> src/hotspot/share/opto/vectornode.cpp line 2233:
> 
>> 2231:   if (in2->Opcode() == Op_VectorMaskCast) {
>> 2232: in2 = in2->in(1);
>> 2233:   }
> 
> Wow, this seems to be an addition that is not covered in the patterns you 
> mention above, right?
> But is that even necessary?
> I suppose here `in2 = VectorMaskCast(all_ones_vector)`.
> Would we not already want to transform this pattern in 
> `VectorMaskCast::Ideal`, is that not possible and more powerful?

Oh yeah, I forgot to mention it in the above comment and commit message. 

Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`.  I agree it's better 
to do this transformation in `VectorMaskCast::Ideal`. I'll remove this code 
change and do the `VectorMaskCast` optimization later. Thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113430734


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-05-29 Thread erifan
On Wed, 28 May 2025 12:16:23 GMT, Emanuel Peter  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2244:
>> 
>>> 2242:   // BoolTest doesn't support unsigned comparisons.
>>> 2243:   BoolTest::mask neg_cond =
>>> 2244:   (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ 
>>> 4);
>> 
>> What is the hard-coded `^ 4` here? This whole line looks like we are looking 
>> at internals of the `VectorMaskCmpNode` or its predicate, and we should 
>> probably do that in some method there? Or maybe it should be part of the 
>> `BoolTest(::mask)` interface?
>
> Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do 
> `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use it 
> as `in1_mask_cmp` instead?

> What is the hard-coded ^ 4 here?

This is to negate the comparison condition. We can't use `BoolTest::negate()` 
here because the comparison condition may be **unsigned** comparison. Since 
there's already a `negate()` function in `BoolTest`, so I tend to add a new 
function `get_negative_predicate` for this into class `VectorMaskCmpNode`.

> Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do 
> as_VectorMaskCmp()? Or could we at least cast it only once, and then use it 
> as in1_mask_cmp instead?

For the first cast, I think you mean

  if (in1->Opcode() != Op_VectorMaskCmp ||
  in1->outcnt() > 1 ||
  !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() ||
  !VectorNode::is_all_ones_vector(in2)) {
return nullptr;
  }

To remove one cast, then we have to split the above `if` because `in1` may not 
be a `VectorMaskCmpNode`.

  if (in1->Opcode() != Op_VectorMaskCmp) {
return nullptr;
  }
  VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1;
  if (in1->outcnt() > 1 ||
  !in1_as_mask_cmp->predicate_can_be_negated() ||
  !VectorNode::is_all_ones_vector(in2)) {
return nullptr;
  }
 BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() ^ 
4);

Does this look better to you ?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113423376


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-05-29 Thread erifan
On Fri, 16 May 2025 07:40:53 GMT, erifan  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 10 additional commits since 
>> the last revision:
>> 
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205      1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4...
>
> Hi, I have updated the code and I'll file the patch to convert 
> `VectorMask.fromLong(SPECIES, -1)` to `maskAll()` soon, I'll cover this test 
> case in that patch. Would you please help review the patch again, thanks!

> @erifan Looks like you really improved things, nice work! I have some more 
> comments below :)

I will modify the code according to your suggestions and commit again after all 
the test passes locally. Thanks for you careful review!

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2918641646


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Thu, 29 May 2025 01:44:49 GMT, Xiaohong Gong  wrote:

>> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
>>  line 49:
>> 
>>> 47: private static final VectorSpecies L_SPECIES = 
>>> LongVector.SPECIES_MAX;
>>> 48: private static final VectorSpecies F_SPECIES = 
>>> FloatVector.SPECIES_MAX;
>>> 49: private static final VectorSpecies D_SPECIES = 
>>> DoubleVector.SPECIES_MAX;
>> 
>> Are you taking `SPECIES_MAX` on purpose here, or could we take 
>> `SPECIES_PREFERRED` instead?
>> @jatin-bhateja What is the best to do in these tests? I suppose best would 
>> be to test with all vector lengths...
>
> Thanks for pointing out this @eme64 ! Per my understanding, `SPECIES_MAX` is 
> almost the same with `SPECIES_PREFERRED` in this case which are all specified 
> to the max vector size of a hardware. Since the max vector size is different 
> on different architectures, not all vector lengths are supported to be 
> intrinsified on a specified architecture like AArch64, especially the SVE 
> arch with different vector register size. Hence, just testing the max species 
> makes sense to me as this is a mid-end common transformation.

Changed to use `ofLargestShape()` because on x64 the max vector length is 
related to data types.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128383805


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-05 Thread erifan
On Thu, 5 Jun 2025 09:32:15 GMT, erifan  wrote:

> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { 
> > return mask(_test^4); }` I think you should use that instead :)
> 
> Indeed, I hadn't noticed that, thank you.

Oh I think we still cannot use `BoolTest::negate`, because we cannot 
instantiate a `BoolTest` object with **unsigned** comparison. 
`BoolTest::negate` is a non-static function.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943500455


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-05 Thread erifan
On Thu, 5 Jun 2025 09:24:10 GMT, Emanuel Peter  wrote:

> FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { 
> return mask(_test^4); }` I think you should use that instead :)

Indeed, I hadn't noticed that, thank you.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943449327


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Thu, 29 May 2025 07:55:06 GMT, erifan  wrote:

>> Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do 
>> `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use 
>> it as `in1_mask_cmp` instead?
>
>> What is the hard-coded ^ 4 here?
> 
> This is to negate the comparison condition. We can't use `BoolTest::negate()` 
> here because the comparison condition may be **unsigned** comparison. Since 
> there's already a `negate()` function in `BoolTest`, so I tend to add a new 
> function `get_negative_predicate` for this into class `VectorMaskCmpNode`.
> 
>> Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do 
>> as_VectorMaskCmp()? Or could we at least cast it only once, and then use it 
>> as in1_mask_cmp instead?
> 
> For the first cast, I think you mean
> 
>   if (in1->Opcode() != Op_VectorMaskCmp ||
>   in1->outcnt() > 1 ||
>   !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() ||
>   !VectorNode::is_all_ones_vector(in2)) {
> return nullptr;
>   }
> 
> To remove one cast, then we have to split the above `if` because `in1` may 
> not be a `VectorMaskCmpNode`.
> 
>   if (in1->Opcode() != Op_VectorMaskCmp) {
> return nullptr;
>   }
>   VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1;
>   if (in1->outcnt() > 1 ||
>   !in1_as_mask_cmp->predicate_can_be_negated() ||
>   !VectorNode::is_all_ones_vector(in2)) {
> return nullptr;
>   }
>  BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() 
> ^ 4);
> 
> Does this look better to you ?

For now I kept the current approach, as I feel it's a little more compact. 
Thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128358563


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Wed, 28 May 2025 12:28:20 GMT, Emanuel Peter  wrote:

>> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237:
>> 
>>> 235: // Byte tests
>>> 236: @Test
>>> 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" },
>> 
>> Could you still assert the presence of some other vectors, just to make sure 
>> we are indeed getting vectors here?
>
> Not testing for any present vectors makes me a little nervous: what if we 
> just don't get any vectors because inlining fails or something else silly 
> happens?

Done.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128379987


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Wed, 28 May 2025 12:18:15 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 10 additional commits since 
>> the last revision:
>> 
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4...
>
> src/hotspot/share/opto/vectornode.cpp line 2251:
> 
>> 2249:   predicate_node, vt);
>> 2250:   if (vmcast_vt != nullptr) {
>> 2251: // We optimized out an VectorMaskCast, and in order to ensure type
> 
> Suggestion:
> 
> // We optimized out a VectorMaskCast, and in order to ensure type

Done.

> src/hotspot/share/opto/vectornode.cpp line 2253:
> 
>> 2251: // We optimized out an VectorMaskCast, and in order to ensure type
>> 2252: // correctness, we need to regenerate one. VectorMaskCast will be 
>> encoded as
>> 2253: // empty for types with the same size.
> 
> Suggestion:
> 
> // a no-op (identity function) for types with the same size.
> 
> Or what do you mean by "empty"? `TOP`? All zeros?

I mean `no-op`. Done, thanks.

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 96:
> 
>> 94: Generator lGen = RD.uniformLongs(Long.MIN_VALUE, 
>> Long.MAX_VALUE);
>> 95: Generator fGen = RD.uniformFloats(Float.MIN_VALUE, 
>> Float.MAX_VALUE);
>> 96: Generator dGen = RD.uniformDoubles(Double.MIN_VALUE, 
>> Double.MAX_VALUE);
> 
> Are you sure you only want to draw from the uniform distribution?
> If you don't super care about the distribution, please just take 
> `RD.ints/longs/floats/doubles()`.
> That way, you get all sorts of distributions, and also some that include NaN 
> values etc. I think that would be important for your float cmp cases, no?

For float and double, we have to use the uniform distribution, because we have 
to make sure `NAN` is not generated. I added some comments about the reasons.

For other types, changed to use `RD.ints/longs`.

We have covered the special cases like +/- Inf, NaN.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128376851
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r212837
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128375032


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-25 Thread erifan
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter  wrote:

>>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const 
>>> > { return mask(_test^4); }` I think you should use that instead :)
>>> 
>>> Indeed, I hadn't noticed that, thank you.
>> 
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
>
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
> 
> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because 
> that is where the exact implementation of the enum values is. In that context 
> it is easier to see why `^4` does the negation. And imagine we were ever to 
> change the enum values, then it would be harder to find your code and fix it.
> 
> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a 
> comment that both signed and unsigned is supported.

@eme64  @XiaohongGong  Your comment has been addressed, thanks for your review!

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3004213740


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-25 Thread erifan
On Wed, 11 Jun 2025 04:56:36 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   Support negating unsigned comparison for BoolTest::mask
>>   
>>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>>   negate both signed and unsigned comparison.
>
> src/hotspot/share/opto/subnode.hpp line 333:
> 
>> 331:   mask negate( ) const { return mask(_test^4); }
>> 332:   // Return the negative mask for the given mask, for both signed and 
>> unsigned comparison.
>> 333:   static mask negate_mask(mask btm) { return mask(btm^4); }
> 
> Suggestion:
> 
>   static mask negate_mask(mask btm) { return mask(btm ^ 4); }
> 
> 
> https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md
> 
>> Use spaces around operators, especially comparisons and assignments. 
>> (Relaxable for boolean expressions and high-precedence operators in classic 
>> math-style formulas.)

Done, thanks!

> src/hotspot/share/opto/vectornode.cpp line 2226:
> 
>> 2224: 
>> 2225:   const TypeVect* vector_mask_cast_vt = nullptr;
>> 2226:   // in1 should be single used, otherwise the optimization may be 
>> unprofitable.
> 
> Suggestion:
> 
>   // in1 should only have a single use, otherwise the optimization may be 
> unprofitable.

Done

> src/hotspot/share/opto/vectornode.cpp line 2237:
> 
>> 2235:   !VectorNode::is_all_ones_vector(in2)) {
>> 2236: return nullptr;
>> 2237:   }
> 
> Similarly here: do you have tests for these conditions, that we do not 
> optimize if any of these fail?

Added some negative tests for these conditions

> src/hotspot/share/opto/vectornode.cpp line 2239:
> 
>> 2237:   }
>> 2238: 
>> 2239:   BoolTest::mask neg_cond = 
>> BoolTest::negate_mask(((VectorMaskCmpNode*) in1)->get_predicate());
> 
> Suggestion:
> 
>   BoolTest::mask neg_cond = 
> BoolTest::negate_mask((in1->as_VectorMaskCmp())->get_predicate());
> 
> Does that compile? It would be prefereable.

Yes, done, thanks!

> src/hotspot/share/opto/vectornode.cpp line 2243:
> 
>> 2241:   const TypeVect* vt = in1->as_Vector()->vect_type();
>> 2242:   Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2),
>> 2243:   predicate_node, vt);
> 
> Suggestion:
> 
>   Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2),
> predicate_node, vt);
> 
> Alignment

Done

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 244:
> 
>> 242: testCompareMaskNotByte(VectorOperators.EQ, m -> m.not());
>> 243: testCompareMaskNotByte(VectorOperators.EQ, m -> 
>> m.xor(B_SPECIES.maskAll(true)));
>> 244: }
> 
> Could it happen that the verification is inlined in the test body?
> 
> Currently, the verification is probably inlined, but the code there is not 
> vectorized. But what if one day the auto-vectorizer is smart enough and 
> vectorizes it, and creates vectors that we currently check `count ...= 0`?
> 
> At least, you could ensure that the verification does not get inlined, with 
> `@DontInline`.
> 
> What do you think?

Make sense, done.

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 692:
> 
>> 690: TestFramework testFramework = new TestFramework();
>> 691: testFramework.addFlags("--add-modules=jdk.incubator.vector");
>> 692: testFramework.setDefaultWarmup(1);
> 
> The default is `2000` is that not enough?
> 
> Increasing it means the test runs slower, here probably about 5x.

Yes, not enough, changed to 5000.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166343933
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166344643
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346170
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346620
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346885
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347867
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166352227


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-25 Thread erifan
On Wed, 11 Jun 2025 08:51:50 GMT, Xiaohong Gong  wrote:

>> erifan has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   Support negating unsigned comparison for BoolTest::mask
>>   
>>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>>   negate both signed and unsigned comparison.
>
> src/hotspot/share/opto/vectornode.cpp line 2237:
> 
>> 2235:   !VectorNode::is_all_ones_vector(in2)) {
>> 2236: return nullptr;
>> 2237:   }
> 
> This part can be refined more clearly:
> 
> // Swap and put all_ones_vector to right
> if (!VectorNode::is_all_ones_vector(in1)) {
>swap(in1, in2);
> }
> 
> // uncast mask
> bool need_cast = false;
> if (in1->Opcode() == Op_VectorMaskCast &&
> in1->outcnt() == 1) {
> assert(in1->bottom_type()->eq(bottom_type()), "");
> need_cast = true;
> in1 = in1->in(1);
> }
> 
> // Check mask cmp pattern
> if (in1->Opcode() != Op_VectorMaskCmp ||
> in1->outcnt() > 1 ||
> !in1->as_VectorMaskCmp()->predicate_can_be_negated()) {
> return nullptr;
> }
> 
> // Convert VectorMaskCmp + not
> 
> 
> // Cast back
> if (need_cast) {
>   res = new VectorMaskCastNode(phase->transform(res), vect_type());
> }

Done

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166351241


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-25 Thread erifan
On Wed, 11 Jun 2025 05:09:34 GMT, Emanuel Peter  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2227:
>> 
>>> 2225:   const TypeVect* vector_mask_cast_vt = nullptr;
>>> 2226:   // in1 should be single used, otherwise the optimization may be 
>>> unprofitable.
>>> 2227:   if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1 && 
>>> in1->in(1)->Opcode() == Op_VectorMaskCmp) {
>> 
>> `in1->in(1)->Opcode() == Op_VectorMaskCmp`
>> Is this check here even necessary? Because we check it below again, right?
>> `in1->Opcode() != Op_VectorMaskCmp`
>
> Btw: do you have a test where `in1->outcnt() > 1`, and you check that the 
> optimization does not happen with an IR test?

Refactored the code, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166345311


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-25 Thread erifan
On Wed, 11 Jun 2025 09:07:12 GMT, erifan  wrote:

>> Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I 
>> would suggest :)
>
> Make sense!

Done

>>> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does 
>>> > that make sense?
>> 
>>> The bottom types of float and double vector masks are casted to int and 
>>> long. Seems this is by design? So this is correct.
>> 
>> This is a `float` test. What is the bottom type for the mask here?
>
> Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double 
> check them all.

Done

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347264
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166349306


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-25 Thread erifan
On Wed, 11 Jun 2025 09:09:57 GMT, erifan  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2221:
>> 
>>> 2219:   // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast 
>>> to in1.
>>> 2220:   if (in2->Opcode() == Op_VectorMaskCmp ||
>>> 2221:   (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == 
>>> Op_VectorMaskCmp)) {
>> 
>> We may need to consider cases that a `VectorMaskCast` is generated between 
>> `compare + not`, such as `compare + cast + not`. For such cases, the element 
>> size maybe different for input and output of a `cast`.  Although this 
>> patch's intention is not for the latter pattern, current change have also 
>> covered it well. Could you please add more test/jmh for all kinds of `cast` 
>> pattern here? And I think the scope of this PR could be also extended to 
>> `compare + cast + not`. WDYT?
>
> Good catch, I'll add more tests and check the correctness. Thanks~

Added more tests for this, thanks!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166350143


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-09 Thread erifan
On Tue, 10 Jun 2025 02:38:29 GMT, Fei Yang  wrote:

> FYI: I submitted to testing in QEMU-system / Ubuntu 25.04 (fastdebug jdk 
> build and 256-bit RVV) and I see `compiler/vectorization`, 
> `compiler/vectorapi` and `jdk_vector` tests are passing.

Thank you very much!

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2957560924


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-06 Thread erifan
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter  wrote:

> > Oh I think we still cannot use `BoolTest::negate`, because we cannot 
> > instantiate a `BoolTest` object with **unsigned** comparison. 
> > `BoolTest::negate` is a non-static function.
> 
> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because 
> that is where the exact implementation of the enum values is. In that context 
> it is easier to see why `^4` does the negation. And imagine we were ever to 
> change the enum values, then it would be harder to find your code and fix it.
> 
> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a 
> comment that both signed and unsigned is supported.

Make sense, I'll update later, thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948297917


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-06 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request incrementally with one additional commit 
since the last revision:

  Support negating unsigned comparison for BoolTest::mask
  
  Added a static method `negate_mask(mask btm)` into BoolTest class to
  negate both signed and unsigned comparison.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/ebbcc405..f51bf722

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06-07

  Stats: 6 lines in 3 files changed: 2 ins; 3 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-06 Thread erifan
On Fri, 6 Jun 2025 07:01:58 GMT, erifan  wrote:

> > > Oh I think we still cannot use `BoolTest::negate`, because we cannot 
> > > instantiate a `BoolTest` object with **unsigned** comparison. 
> > > `BoolTest::negate` is a non-static function.
> > 
> > 
> > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, 
> > because that is where the exact implementation of the enum values is. In 
> > that context it is easier to see why `^4` does the negation. And imagine we 
> > were ever to change the enum values, then it would be harder to find your 
> > code and fix it.
> > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a 
> > comment that both signed and unsigned is supported.
> 
> Make sense, I'll update later, thanks.

@eme64 your comment is addressed, thanks for your suggestion!

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948832452


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Thu, 29 May 2025 08:00:05 GMT, erifan  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2233:
>> 
>>> 2231:   if (in2->Opcode() == Op_VectorMaskCast) {
>>> 2232: in2 = in2->in(1);
>>> 2233:   }
>> 
>> Wow, this seems to be an addition that is not covered in the patterns you 
>> mention above, right?
>> But is that even necessary?
>> I suppose here `in2 = VectorMaskCast(all_ones_vector)`.
>> Would we not already want to transform this pattern in 
>> `VectorMaskCast::Ideal`, is that not possible and more powerful?
>
> Oh yeah, I forgot to mention it in the above comment and commit message. 
> 
> Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`.  I agree it's 
> better to do this transformation in `VectorMaskCast::Ideal`. I'll remove this 
> code change and do the `VectorMaskCast` optimization later. Thanks!

Done.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128344233


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]

2025-06-05 Thread erifan
On Wed, 28 May 2025 12:03:48 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 10 additional commits since 
>> the last revision:
>> 
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - Merge branch 'master' into JDK-8354242
>>  - Addressed some review comments
>>
>>1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
>>2. Improve code comments.
>>  - Merge branch 'master' into JDK-8354242
>>  - Merge branch 'master' into JDK-8354242
>>  - 8354242: VectorAPI: combine vector not operation with compare
>>
>>This patch optimizes the following patterns:
>>For integer types:
>>```
>>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>=> (VectorMaskCmp src1 src2 ncond)
>>```
>>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>>negative comparison of cond.
>>
>>For float and double types:
>>```
>>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>>```
>>cond can be eq or ne.
>>
>>Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>>With option `-XX:UseSVE=2`:
>>```
>>Benchmark UnitBefore  Score Error After   
>> Score Error Uplift
>>testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>>testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>>testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>>testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>>testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>>testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>>testCompareGEMaskNotByte  ops/s   7907615.16  4...
>
> src/hotspot/share/opto/vectornode.cpp line 2213:
> 
>> 2211:   Node* in1 = in(1);
>> 2212:   Node* in2 = in(2);
>> 2213:   // Transformations for predicated IRs are not supported for now.
> 
> Suggestion:
> 
>   // Transformations for predicated vectors are not supported for now.

Done.

> src/hotspot/share/opto/vectornode.cpp line 2215:
> 
>> 2213:   // Transformations for predicated IRs are not supported for now.
>> 2214:   if (is_predicated_vector() || in1->is_predicated_vector() ||
>> 2215:   in2->is_predicated_vector()) {
> 
> I would either put all on the same line, or all on separate lines.

Done.

> src/hotspot/share/opto/vectornode.cpp line 2219:
> 
>> 2217:   }
>> 2218: 
>> 2219:   // XorV/XorVMask is commutative, swap 
>> VectorMaskCmp/Op_VectorMaskCast to in1.
> 
> Suggestion:
> 
>   // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1.
> 
> Would look a little cleaner, and you did also not write `Op_VectorMaskCmp` 
> either ;)

Done, thanks!

> src/hotspot/share/opto/vectornode.cpp line 2225:
> 
>> 2223:   }
>> 2224: 
>> 2225:   const TypeVect* vmcast_vt = nullptr;
> 
> Suggestion:
> 
>   const TypeVect* vector_mask_cast_vt = nullptr;
> 
> I think it would not hurt to write it out. Otherwise, the reader always has 
> to reconstruct that in their head.

Done.

> src/hotspot/share/opto/vectornode.cpp line 2230:
> 
>> 2228: vmcast_vt = in1->as_Vector()->vect_type();
>> 2229: in1 = in1->in(1);
>> 2230:   }
> 
> Add a comment why you check `in1->outcnt() == 1`.

Done.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341063
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128340484
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341959
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342468
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342908


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-06-05 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains 12 additional commits since the last 
revision:

 - Addressed some review comments
 - Merge branch 'master' into JDK-8354242
 - Refactor the JTReg tests for compare.xor(maskAll)
   
   Also made a bit change to support pattern `VectorMask.fromLong()`.
 - Merge branch 'master' into JDK-8354242
 - Refactor code
   
   Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
   optimization, making the code more modular.
 - Merge branch 'master' into JDK-8354242
 - Update the jtreg test
 - Merge branch 'master' into JDK-8354242
 - Addressed some review comments
   
   1. Call VectorNode::Ideal() only once in XorVNode::Ideal.
   2. Improve code comments.
 - Merge branch 'master' into JDK-8354242
 - ... and 2 more: https://git.openjdk.org/jdk/compare/71938fba...ebbcc405

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/f2f71e34..ebbcc405

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=05-06

  Stats: 146911 lines in 2345 files changed: 87334 ins; 41007 del; 18570 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-10 Thread erifan
On Fri, 6 Jun 2025 10:38:11 GMT, erifan  wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>> => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>> => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> BenchmarkUnitBefore  Score Error After   
>> Score Error Uplift
>> testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>> testCompareEQMaskNotFloatops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>> testCompareEQMaskNotInt  ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>> testCompareEQMaskNotLong ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>> testCompareEQMaskNotShortops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>> testCompareGEMaskNotByte ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>> testCompareGEMaskNotInt  ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>> testCompareGEMaskNotLong ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>> testCompareGEMaskNotShortops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>> testCompareGTMaskNotByte ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>> testCompareGTMaskNotInt  ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>> testCompareGTMaskNotLong ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>> testCompareGTMaskNotShortops/s   3339509.141 3339.976585 
>> 4711442.496 2673.364893 1.41
>> testCompareLEMaskNotByte ops/s   7911340.004 3114.69191  
>> 10231626.5  27134.20035 1.29
>> testCompareLEMaskNotInt  ops/s   1675812.113 1340.969885 
>> 2353255.341 1452.4522   1.4
>> testCompareLEMaskNotLong ops/s   848862.8036 6564.841731 
>> 1177763.623 539.290106  1.38
>> testCompareLEMaskNotShortops/s   3324951.54  2380.29473  
>> 4712116.251 1544.559684 1.41
>> testCompareLTMaskNotByte ops/s   7910390.844 2630.861436 
>> 10239567.69 6487.441672 1.29
>> testCompareLTMaskNotInt  ops/s   16721...
>
> erifan has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Support negating unsigned comparison for BoolTest::mask
>   
>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>   negate both signed and unsigned comparison.

Hi, any further comments?

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2961024225


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-11 Thread erifan
On Wed, 11 Jun 2025 08:30:51 GMT, Emanuel Peter  wrote:

>>> You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does 
>>> that make sense?
>> 
>> The bottom types of `float` and `double` vector masks are casted to `int` 
>> and `long`. Seems this is by design? So this is correct.
>> 
>> As for `control test`, yes for now I didn't add any such kind of test, 
>> because I personally think it's unnecessary. For specific code, it won't 
>> trigger this optimization now, but new optimizations in the future may cause 
>> this test to fail.
>> 
>> Anyway I've tested the case where `vectormaskcmp` is multi used locally, and 
>> this optimization won't be triggered. 
>> 
>> Do you think it's necessary to add such a control test?
>
>> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does 
>> > that make sense?
> 
>> The bottom types of float and double vector masks are casted to int and 
>> long. Seems this is by design? So this is correct.
> 
> This is a `float` test. What is the bottom type for the mask here?

Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double check 
them all.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139582391


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-11 Thread erifan
On Wed, 11 Jun 2025 07:43:55 GMT, Emanuel Peter  wrote:

>> `VectorOperators.XXX` is not compile time constants, we can't use `switch` 
>> here.
>
> Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I 
> would suggest :)

Make sense!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139610469


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-11 Thread erifan
On Wed, 11 Jun 2025 05:23:12 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   Support negating unsigned comparison for BoolTest::mask
>>   
>>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>>   negate both signed and unsigned comparison.
>
> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 158:
> 
>> 156: } else if (op == VectorOperators.UGT) {
>> 157: Asserts.assertEquals(compareUnsigned(a, b) <= 0, r);
>> 158: }
> 
> Please refactor it as a `switch`. And add a `default` case where you throw 
> some `RuntimeException`. just to make sure we are not missing anything :)

`VectorOperators.XXX` is not compile time constants, we can't use `switch` here.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139418759


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-11 Thread erifan
On Wed, 11 Jun 2025 08:47:56 GMT, Xiaohong Gong  wrote:

>> erifan has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   Support negating unsigned comparison for BoolTest::mask
>>   
>>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>>   negate both signed and unsigned comparison.
>
> src/hotspot/share/opto/vectornode.cpp line 2221:
> 
>> 2219:   // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast 
>> to in1.
>> 2220:   if (in2->Opcode() == Op_VectorMaskCmp ||
>> 2221:   (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == 
>> Op_VectorMaskCmp)) {
> 
> We may need to consider cases that a `VectorMaskCast` is generated between 
> `compare + not`, such as `compare + cast + not`. For such cases, the element 
> size maybe different for input and output of a `cast`.  Although this patch's 
> intention is not for the latter pattern, current change have also covered it 
> well. Could you please add more test/jmh for all kinds of `cast` pattern 
> here? And I think the scope of this PR could be also extended to `compare + 
> cast + not`. WDYT?

Good catch, I'll add more tests and check the correctness. Thanks~

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139615887


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-09 Thread erifan
On Fri, 6 Jun 2025 10:38:11 GMT, erifan  wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>> => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>> => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> BenchmarkUnitBefore  Score Error After   
>> Score Error Uplift
>> testCompareEQMaskNotByte ops/s   7912127.225 2677.289518 
>> 10266136.26 8955.008548 1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799 446.963779  
>> 1179760.772 448.031844  1.33
>> testCompareEQMaskNotFloatops/s   1765045.787 682.332214  
>> 2359520.803 896.305743  1.33
>> testCompareEQMaskNotInt  ops/s   1787221.411 977.743935  
>> 2353952.519 960.069976  1.31
>> testCompareEQMaskNotLong ops/s   895297.1974 673.44808   
>> 1178449.02  323.804205  1.31
>> testCompareEQMaskNotShortops/s   3339987.002 3415.2226   
>> 4712761.965 2110.862053 1.41
>> testCompareGEMaskNotByte ops/s   7907615.16  4094.243652 
>> 10251646.9  9486.699831 1.29
>> testCompareGEMaskNotInt  ops/s   1683738.958 4233.813092 
>> 2352855.205 1251.952546 1.39
>> testCompareGEMaskNotLong ops/s   854496.1561 8594.598885 
>> 1177811.493 521.12291.37
>> testCompareGEMaskNotShortops/s   3341860.309 1578.975338 
>> 4714008.434 1681.10365  1.41
>> testCompareGTMaskNotByte ops/s   7910823.674 2993.367032 
>> 10245063.58 9774.75138  1.29
>> testCompareGTMaskNotInt  ops/s   1673393.928 3153.099431 
>> 2353654.521 1190.848583 1.4
>> testCompareGTMaskNotLong ops/s   849405.9159 2432.858159 
>> 1177952.041 359.96413   1.38
>> testCompareGTMaskNotShortops/s   3339509.141 3339.976585 
>> 4711442.496 2673.364893 1.41
>> testCompareLEMaskNotByte ops/s   7911340.004 3114.69191  
>> 10231626.5  27134.20035 1.29
>> testCompareLEMaskNotInt  ops/s   1675812.113 1340.969885 
>> 2353255.341 1452.4522   1.4
>> testCompareLEMaskNotLong ops/s   848862.8036 6564.841731 
>> 1177763.623 539.290106  1.38
>> testCompareLEMaskNotShortops/s   3324951.54  2380.29473  
>> 4712116.251 1544.559684 1.41
>> testCompareLTMaskNotByte ops/s   7910390.844 2630.861436 
>> 10239567.69 6487.441672 1.29
>> testCompareLTMaskNotInt  ops/s   16721...
>
> erifan has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Support negating unsigned comparison for BoolTest::mask
>   
>   Added a static method `negate_mask(mask btm)` into BoolTest class to
>   negate both signed and unsigned comparison.

Hi @RealFYang , would you mind helping test this patch on a risc-v machine with 
rvv? This optimization also applies to rvv, but I don't have a risc-v test 
environment.

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2955291395


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]

2025-06-11 Thread erifan
On Wed, 11 Jun 2025 05:31:00 GMT, Emanuel Peter  wrote:

> You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does 
> that make sense?

The bottom types of `float` and `double` vector masks are casted to `int` and 
`long`. Seems this is by design? So this is correct.

As for `control test`, yes for now I didn't add any such kind of test, because 
I personally think it's unnecessary. For specific code, it won't trigger this 
optimization now, but new optimizations in the future may cause this test to 
fail.

Anyway I've tested the case where `vectormaskcmp` is multi used locally, and 
this optimization won't be triggered. 

Do you think it's necessary to add such a control test?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139512969


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v10]

2025-07-08 Thread erifan
On Tue, 8 Jul 2025 11:42:02 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 17 additional commits since 
>> the last revision:
>> 
>>  - Align indentation
>>  - Merge branch 'master' into JDK-8354242
>>  - Address more comments
>>
>>ATT.
>>  - Merge branch 'master' into JDK-8354242
>>  - Support negating unsigned comparison for BoolTest::mask
>>
>>Added a static method `negate_mask(mask btm)` into BoolTest class to
>>negate both signed and unsigned comparison.
>>  - Addressed some review comments
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/04bd77d0...db78dc43
>
> src/hotspot/share/opto/vectornode.cpp line 2241:
> 
>> 2239:   in1->outcnt() != 1 ||
>> 2240:   !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||
>> 2241: !VectorNode::is_all_ones_vector(in2)) {
> 
> Suggestion:
> 
>   !in1->as_VectorMaskCmp()->predicate_can_be_negated() ||
>   !VectorNode::is_all_ones_vector(in2)) {
> 
> Remove the indentation again, and the superfluous brackets too ;)

Done

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2194130835


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-07-08 Thread erifan
On Tue, 8 Jul 2025 11:42:18 GMT, Emanuel Peter  wrote:

>> Oh wow, my bad. I misunderstood the brackets!
>> 
>> Instead of:
>> 
>>   !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||
>>   !VectorNode::is_all_ones_vector(in2)) {
>> 
>> I read:
>> 
>>   !(in1->as_VectorMaskCmp()->predicate_can_be_negated() ||
>>   !VectorNode::is_all_ones_vector(in2))) {
>> 
>> That confused me a lot... absolutely my bad.
>> 
>> Well actually then my indentation suggestion was terrible!
>
> I made a new suggestion below.

> A code comment would be helpful for this case. 

I updated the comment above the code a bit. As for why predicate need to be 
negatable, it's straightforward, the key of this optimization is to change 
predicate condition into negative predicate condition. And in 
`predicate_can_be_negated`, there's a comment explaining when predicate can't 
be negated.

> I made a new suggestion below.

Done.

> That confused me a lot... absolutely my bad.
Well actually then my indentation suggestion was terrible!

No problem. I'm a newbie in the JDK community, so generally I think your 
suggestions are valuable.Thanks for your review!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2194130234


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v11]

2025-07-08 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request incrementally with one additional commit 
since the last revision:

  Update the code comment

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/db78dc43..04142a19

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=09-10

  Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-07-07 Thread erifan
On Mon, 7 Jul 2025 09:32:42 GMT, erifan  wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2243:
>> 
>>> 2241:   !VectorNode::is_all_ones_vector(in2)) {
>>> 2242: return nullptr;
>>> 2243:   }
>> 
>> Suggestion:
>> 
>>   if (in1->Opcode() != Op_VectorMaskCmp ||
>>   in1->outcnt() != 1 ||
>>   !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||
>> !VectorNode::is_all_ones_vector(in2)) {
>> return nullptr;
>>   }
>> 
>> Indentation for clarity.
>> 
>> Currently, you exiting if one of these is the case:
>> 1. Not `MaskCmp`
>> 2. More than one use
>> 3. predicate cannot be negated AND the vector is all ones. Can you explain 
>> this condition? Do you have tests for cases:
>>- predicate negatable and vector not all ones
>>- predircate not negatable and vector not all ones
>>- predicate negatable and vector all ones
>>- predicate not negatable and vectors all ones
>>  
>> Why do you guard against `VectorNode::is_all_ones_vector(in2)` at all?
>> 
>> The condition for 3. is easy to get wrong, so good testing is important here 
>> :)
>
> The current testing status for the conditions you listed:
>> 1. Not MaskCmp.
> 
> **No test for it, tested locally**, Because I think this condition is too 
> straightforward.
> 
>> 2. More than one use.
> 
> **Tested**, see `VectorMaskCompareNotTest.java line 1118`.
> 
>> predicate negatable and vector not all ones. 
> 
> **Tested**, see `VectorMaskCompareNotTest.java line 1126`.
> 
>> predicate not negatable and vector not all ones. 
> 
> **No test for it**, because we have tests for `predicate not negatable` or 
> `vector not all ones`. If either is `false`, return nullptr.
> 
>> predicate negatable and vector all ones. 
> 
> **A lot of tests for it**. For example `VectorMaskCompareNotTest.java line 
> 1014`.
> 
>> predicate not negatable and vectors all ones. 
> 
> **Tested**, see `VectorMaskCompareNotTest.java line 1222`.

> Indentation for clarity.

Done.

I think we have enough negative tests. Please take a look at this PR, thanks~

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2191550171


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v10]

2025-07-07 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains 17 additional commits since the last 
revision:

 - Align indentation
 - Merge branch 'master' into JDK-8354242
 - Address more comments
   
   ATT.
 - Merge branch 'master' into JDK-8354242
 - Support negating unsigned comparison for BoolTest::mask
   
   Added a static method `negate_mask(mask btm)` into BoolTest class to
   negate both signed and unsigned comparison.
 - Addressed some review comments
 - Merge branch 'master' into JDK-8354242
 - Refactor the JTReg tests for compare.xor(maskAll)
   
   Also made a bit change to support pattern `VectorMask.fromLong()`.
 - Merge branch 'master' into JDK-8354242
 - Refactor code
   
   Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
   optimization, making the code more modular.
 - ... and 7 more: https://git.openjdk.org/jdk/compare/c2a2adc8...db78dc43

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/5ebdc572..db78dc43

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=08-09

  Stats: 9269 lines in 462 files changed: 4528 ins; 2873 del; 1868 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-07-08 Thread erifan
On Tue, 8 Jul 2025 09:49:30 GMT, erifan  wrote:

>> Thanks for your answers @erifan !
>> 
>> Can you please answer these as well?
>> 
>>> predicate cannot be negated AND the vector is all ones. Can you explain 
>>> this condition?
>> 
>> A code comment would be helpful for this case. I'm a little bit struggling 
>> to understand the bracket/negation here.
>> 
>>> Why do you guard against VectorNode::is_all_ones_vector(in2) at all?
>> 
>> Is this necessary? Why?
>
>> predicate cannot be negated AND the vector is all ones. Can you explain this 
>> condition?
> 
> Ok, I'll add a comment for it.
> 
>> Why do you guard against VectorNode::is_all_ones_vector(in2) at all?
> 
> Because one of the nodes in the supported patterns by this PR needs to be 
> `MaskAll` or `Replicate`. And this function `VectorNode::is_all_ones_vector` 
> just meets our check for `MaskAll` and `Replicate`. Actually I don't quite 
> understand your question. I have two understandings:
> 1. Not all nodes that `VectorNode::is_all_ones_vector` returns true are 
> `MaskAll` or `Replicate`, but other nodes that do not meet the conditions.
> 2. Here, it does not need to be a vector with every bit 1, it only needs to 
> be an `all true` mask.
> 
> Which one do you mean? Or something else? Thanks~

The purpose of this PR is optimizing the following kinds of patterns:

XXXVector va, vb;
va.compare(EQ, vb).not()

And the generated IR of `va.compare(EQ, vb).not()` is `(XorVMask (VectorMaskCmp 
va vb EQ) (MaskAll -1))`. On platforms like aarch64 NEON, `MaskAll` is 
`Replicate`. And `MaskAll` and `Replicate` are both all ones vectors, so we do 
this check `VectorNode::is_all_ones_vector(in2)`

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2192087590


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-07-07 Thread erifan
On Mon, 7 Jul 2025 06:19:15 GMT, Emanuel Peter  wrote:

>> erifan has updated the pull request with a new target base due to a merge or 
>> a rebase. The incremental webrev excludes the unrelated changes brought in 
>> by the merge/rebase. The pull request contains 15 additional commits since 
>> the last revision:
>> 
>>  - Address more comments
>>
>>ATT.
>>  - Merge branch 'master' into JDK-8354242
>>  - Support negating unsigned comparison for BoolTest::mask
>>
>>Added a static method `negate_mask(mask btm)` into BoolTest class to
>>negate both signed and unsigned comparison.
>>  - Addressed some review comments
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor the JTReg tests for compare.xor(maskAll)
>>
>>Also made a bit change to support pattern `VectorMask.fromLong()`.
>>  - Merge branch 'master' into JDK-8354242
>>  - Refactor code
>>
>>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
>>optimization, making the code more modular.
>>  - Merge branch 'master' into JDK-8354242
>>  - Update the jtreg test
>>  - ... and 5 more: https://git.openjdk.org/jdk/compare/8e600a2f...5ebdc572
>
> src/hotspot/share/opto/vectornode.cpp line 2243:
> 
>> 2241:   !VectorNode::is_all_ones_vector(in2)) {
>> 2242: return nullptr;
>> 2243:   }
> 
> Suggestion:
> 
>   if (in1->Opcode() != Op_VectorMaskCmp ||
>   in1->outcnt() != 1 ||
>   !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||
> !VectorNode::is_all_ones_vector(in2)) {
> return nullptr;
>   }
> 
> Indentation for clarity.
> 
> Currently, you exiting if one of these is the case:
> 1. Not `MaskCmp`
> 2. More than one use
> 3. predicate cannot be negated AND the vector is all ones. Can you explain 
> this condition? Do you have tests for cases:
>- predicate negatable and vector not all ones
>- predircate not negatable and vector not all ones
>- predicate negatable and vector all ones
>- predicate not negatable and vectors all ones
>  
> Why do you guard against `VectorNode::is_all_ones_vector(in2)` at all?
> 
> The condition for 3. is easy to get wrong, so good testing is important here 
> :)

The current testing status for the conditions you listed:
> 1. Not MaskCmp.

**No test for it, tested locally**, Because I think this condition is too 
straightforward.

> 2. More than one use.

**Tested**, see `VectorMaskCompareNotTest.java line 1118`.

> predicate negatable and vector not all ones. 

**Tested**, see `VectorMaskCompareNotTest.java line 1126`.

> predicate not negatable and vector not all ones. 

**No test for it**, because we have tests for `predicate not negatable` or 
`vector not all ones`. If either is `false`, return nullptr.

> predicate negatable and vector all ones. 

**A lot of tests for it**. For example `VectorMaskCompareNotTest.java line 
1014`.

> predicate not negatable and vectors all ones. 

**Tested**, see `VectorMaskCompareNotTest.java line 1222`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2189495935


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-07-08 Thread erifan
On Tue, 8 Jul 2025 08:56:44 GMT, Emanuel Peter  wrote:

> predicate cannot be negated AND the vector is all ones. Can you explain this 
> condition?

Ok, I'll add a comment for it.

> Why do you guard against VectorNode::is_all_ones_vector(in2) at all?

Because one of the nodes in the supported patterns by this PR needs to be 
`MaskAll` or `Replicate`. And this function `VectorNode::is_all_ones_vector` 
just meets our check for `MaskAll` and `Replicate`. Actually I don't quite 
understand your question. I have two understandings:
1. Not all nodes that `VectorNode::is_all_ones_vector` returns true are 
`MaskAll` or `Replicate`, but other nodes that do not meet the conditions.
2. Here, it does not need to be a vector with every bit 1, it only needs to be 
an `all true` mask.

Which one do you mean? Or something else? Thanks~

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2192019309


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]

2025-06-25 Thread erifan
> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
> => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
> => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
> negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
> `-XX:UseSVE=2`:
> 
> Benchmark UnitBefore  Score Error After   
> Score Error Uplift
> testCompareEQMaskNotByte  ops/s   7912127.225 2677.289518 
> 10266136.26 8955.008548 1.29
> testCompareEQMaskNotDoubleops/s   884737.6799 446.963779  
> 1179760.772 448.031844  1.33
> testCompareEQMaskNotFloat ops/s   1765045.787 682.332214  
> 2359520.803 896.305743  1.33
> testCompareEQMaskNotInt   ops/s   1787221.411 977.743935  
> 2353952.519 960.069976  1.31
> testCompareEQMaskNotLong  ops/s   895297.1974 673.44808   
> 1178449.02  323.804205  1.31
> testCompareEQMaskNotShort ops/s   3339987.002 3415.2226   
> 4712761.965 2110.862053 1.41
> testCompareGEMaskNotByte  ops/s   7907615.16  4094.243652 
> 10251646.9  9486.699831 1.29
> testCompareGEMaskNotInt   ops/s   1683738.958 4233.813092 
> 2352855.205 1251.952546 1.39
> testCompareGEMaskNotLong  ops/s   854496.1561 8594.598885 
> 1177811.493 521.12291.37
> testCompareGEMaskNotShort ops/s   3341860.309 1578.975338 
> 4714008.434 1681.10365  1.41
> testCompareGTMaskNotByte  ops/s   7910823.674 2993.367032 
> 10245063.58 9774.75138  1.29
> testCompareGTMaskNotInt   ops/s   1673393.928 3153.099431 
> 2353654.521 1190.848583 1.4
> testCompareGTMaskNotLong  ops/s   849405.9159 2432.858159 
> 1177952.041 359.96413   1.38
> testCompareGTMaskNotShort ops/s   3339509.141 3339.976585 
> 4711442.496 2673.364893 1.41
> testCompareLEMaskNotByte  ops/s   7911340.004 3114.69191  
> 10231626.5  27134.20035 1.29
> testCompareLEMaskNotInt   ops/s   1675812.113 1340.969885 
> 2353255.341 1452.4522   1.4
> testCompareLEMaskNotLong  ops/s   848862.8036 6564.841731 
> 1177763.623 539.290106  1.38
> testCompareLEMaskNotShort ops/s   3324951.54  2380.29473  
> 4712116.251 1544.559684 1.41
> testCompareLTMaskNotByte  ops/s   7910390.844 2630.861436 
> 10239567.69 6487.441672 1.29
> testCompareLTMaskNotInt   ops/s   1672180.09  995.238142  
> 2353757.863 853.774734  1.4
> testCompareLTMaskNotLong  ops/s   856502.26...

erifan has updated the pull request with a new target base due to a merge or a 
rebase. The incremental webrev excludes the unrelated changes brought in by the 
merge/rebase. The pull request contains 15 additional commits since the last 
revision:

 - Address more comments
   
   ATT.
 - Merge branch 'master' into JDK-8354242
 - Support negating unsigned comparison for BoolTest::mask
   
   Added a static method `negate_mask(mask btm)` into BoolTest class to
   negate both signed and unsigned comparison.
 - Addressed some review comments
 - Merge branch 'master' into JDK-8354242
 - Refactor the JTReg tests for compare.xor(maskAll)
   
   Also made a bit change to support pattern `VectorMask.fromLong()`.
 - Merge branch 'master' into JDK-8354242
 - Refactor code
   
   Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this
   optimization, making the code more modular.
 - Merge branch 'master' into JDK-8354242
 - Update the jtreg test
 - ... and 5 more: https://git.openjdk.org/jdk/compare/c8050dff...5ebdc572

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/f51bf722..5ebdc572

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07-08

  Stats: 36932 lines in 979 files changed: 21941 ins; 10041 del; 4950 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-07-03 Thread erifan
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter  wrote:

>>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const 
>>> > { return mask(_test^4); }` I think you should use that instead :)
>>> 
>>> Indeed, I hadn't noticed that, thank you.
>> 
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
>
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
> 
> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because 
> that is where the exact implementation of the enum values is. In that context 
> it is easier to see why `^4` does the negation. And imagine we were ever to 
> change the enum values, then it would be harder to find your code and fix it.
> 
> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a 
> comment that both signed and unsigned is supported.

Hi @eme64 @jatin-bhateja , would you mind taking another look of this PR, 
thanks~

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3031109432


Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]

2025-07-23 Thread erifan
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter  wrote:

>>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const 
>>> > { return mask(_test^4); }` I think you should use that instead :)
>>> 
>>> Indeed, I hadn't noticed that, thank you.
>> 
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
>
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot 
>> instantiate a `BoolTest` object with **unsigned** comparison. 
>> `BoolTest::negate` is a non-static function.
> 
> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because 
> that is where the exact implementation of the enum values is. In that context 
> it is easier to see why `^4` does the negation. And imagine we were ever to 
> change the enum values, then it would be harder to find your code and fix it.
> 
> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a 
> comment that both signed and unsigned is supported.

Hi @eme64 @jatin-bhateja , could you please take a look at this patch? Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3111851722