Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` BenchmarkUnitBefore Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Mon, 28 Apr 2025 09:17:58 GMT, Emanuel Peter wrote: > > > Just a drive-by comment for now, I may review this later more fully. > > > > I would also prefer if you added the IR restrictions rather than the > > > > JTREG requires. > > > > The benefit is that we can still run the tests on all platforms, at > > > > least for result verification. > > > > Imagine someone adds optimizations to a new platform, but does not know > > > > about this test here. They make a mistake, and there is a bug, leading > > > > either to a crash or wrong result. With the requires, you test would > > > > never even run, and we would not catch it. With the IR applyIf, we > > > > would catch the bug. > > > > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, and > > > adding in a new platform later is not really hard either. > > > > > > Thanks! The problem is that when a new platform is added, people may not > > even know there is a test. > > @erifan That is true. But we have that problem either way. If you use > `@require`, then the person does not realize there is a test AND the test is > not run. If you use `applyIf`, the person does not realize there is a test, > but it is run at least for result verifiation - and then the person MIGHT > realize if the test catches a wrong result / crash. This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834650280
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Mon, 28 Apr 2025 14:10:40 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: >> >>> 235: public static void testCompareEQMaskNotByte() { >>> 236: testCompareMaskNotByte(VectorOperators.EQ); >>> 237: } >> >> Another comment: you now only have "negative" tests, where you check for >> count `=0`. It would be good if you also had a positive rule here, one where >> you do see an XOR in a similar case, where your optimization does apply. >> >> This would basically be a "control" that checks that your are testing the >> right thing. > > Also: this test should vectorize on some plarforms, right? A compare, > correct? Would it not be good to actively check that with an IR rule? > Another comment: you now only have "negative" tests, where you check for > count =0. It would be good if you also had a positive rule here, one where > you do see an XOR in a similar case, where your optimization does apply. This would basically be a "control" that checks that your are testing the right thing. This is not a negative test, this is a positive test. What this patch does is: `Vector compare(NE) + not() => vector compare(EQ)`. The `not()` operation is removed. For details, please see the commit message and related code. So here we check XorV and XorVMask == 0, which I think is reasonable. For negative tests, I think it's not necessary, because I only test what I optimized, and I won't write a test to say what optimization cannot be done. > Also: this test should vectorize on some plarforms, right? A compare, > correct? Would it not be good to actively check that with an IR rule? The `compare` operation is not eliminated, the patch aims to eliminate the `not` operation following `compare`. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065195738
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Fri, 25 Apr 2025 09:48:59 GMT, Jatin Bhateja wrote: >> Since this is a platform independent optimization, I tend to use this >> `@requires` because it's simpler. If we use `applyIfCPUFeatureOr`, we need >> to add the same restriction before each test. In addition, if a new >> architecture supports the vector node, this test may not cover it. What do >> you think? > > I don't see XorVMask implemented on all non-x86 target, like PPC etc.. This is not specifically required on x86, but because this test fails on x86 when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the sub-graph is like this: `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))` It is not an optimization pattern supported by this patch because we don't know what's the comparison op. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2062583790
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter wrote: > Yes, this discussion is down to `requires` vs `applyIf`. This is my argument > for `applyIf`, quoted from above, I have not yet seen an argument against it: > > > If you use @require, then the person does not realize there is a test AND > > the test is not run. If you use applyIf, the person does not realize there > > is a test, but it is run at least for result verifiation - and then the > > person MIGHT realize if the test catches a wrong result / crash. > > In my understanding, `requires` should only be used if the test really > **requires** a certain platform or feature. That can be because some flags > are only available under certain platforms for example. But for IR tests, we > should try to always use `applyIf`, because it allows testing on other > platforms. > > Actually, I filed this RFE a while ago: > https://bugs.openjdk.org/browse/JDK-8310891 We should try to move as many > tests from using `requires` to `applyIf`, so that we have an increased test > coverage. I see, I'll update the code. Thanks~ - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2840582699
RFR: 8354242: VectorAPI: combine vector not operation with compare
This patch optimizes the following patterns: For integer types: (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: Benchmark UnitBefore Score Error After Score Error Uplift testCompareEQMaskNotByteops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLongops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByteops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLongops/s 854496.1561 8594.598885 1177811.493 521.12291.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByteops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLongops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByteops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLongops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByteops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLongops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByteops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.14011.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLongops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotIntops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.66211.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotIntops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotIntops/s 1672350.925 1545.304304 2353
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains four additional commits since >> the last revision: >> >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >>testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >>testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >>testCompareGTMaskNotInt ops/s 1673393... > > Yes, this discussion is down to `requires` vs `applyIf`. This is my argument > for `applyIf`, quoted from above, I have not yet seen an argument against it: > >> If you use @require, then the person does not realize there is a test AND >> the test is not run. If you use applyIf, the person does not realize there >> is a test, but it is run at least for result verifiation - and then the >> person MIGHT realize if the test catches a wrong result / crash. > > In my understanding, `requires` should only be used if the test really > **requires** a certain platform or feature. That can be because some flags > are only available under certain platforms for example. But for IR tests, we > should try to always use `applyIf`, because it allows testing on other > platforms. > > Actually, I filed this RFE a while ago: > https://bugs.openjdk.org/browse/JDK-8310891 > We should try to move as many tests from using `requires` to `applyIf`, so > that we have an increased test coverage. @eme64 @jatin-bhateja I have updated the test, thanks for your suggestion. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2844256626
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` BenchmarkUnitBefore Score Error After
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Wed, 23 Apr 2025 12:09:51 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains two additional commits since >> the last revision: >> >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >>testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >>testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >>testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >>testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >>testCompareGTMaskNotShort ops/s 3339509.141 ... > > src/hotspot/share/opto/vectornode.cpp line 2234: > >> 2232: // XorV/XorVMask is commutative, swap >> VectorMaskCmp/Op_VectorMaskCast to in1. >> 2233: if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != >> Op_VectorMaskCast) { >> 2234: swap(in1, in2); > > Swapping inputs like this without refreshing GVN bookkeeping is not safe. I > guess you wanted to use Node::swap_edges. The edges are not swapped, but two variables in1 and in2 > src/hotspot/share/opto/vectornode.cpp line 2243: > >> 2241: in1 = in1->in(1); >> 2242: } >> 2243: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || > > Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for > a specific graph palette and replaces it with a new node whose inputs are the > same as the inputs of the palette. GVN will do the retention job if any > intermediate node has users beyond the pattern being replaced. Thanks for telling me this information. Another more important reason to check outcnt here is to prevent this optimization when the uses of VectorMaskCmp is greater than 1, because this optimization may not be worthwhile. For example: public static void testVectorMaskCmp() { IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0); IntVector av = IntVector.fromArray(I_SPECIES, ia, 0); VectorMask m1 = av.compare(VectorOperators.NE, bv); // two uses VectorMask m2 =m1.not(); m1.intoArray(m, 0); av.lanewise(VectorOper
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Fri, 18 Apr 2025 01:36:10 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> BenchmarkUnitBefore Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloatops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShortops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >> testCompareGEMaskNotShortops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShortops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShortops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains two additional commits since the > last revision: > > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > >This patch optimizes the following patterns: >For integer types: >``` >(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >=> (VectorMaskCmp src1 src2 ncond) >(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >=> (VectorMaskCmp src1 src2 ncond) >``` >cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >negative comparison of cond. > >For float and double types: >``` >(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >``` >cond can be eq or ne. > >Benchmarks on Nvidia Grace machine with 128-bit SVE2: >With option `-XX:UseSVE=2`: >``` >Be
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Thu, 24 Apr 2025 10:28:15 GMT, Andrew Haley wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains two additional commits since >> the last revision: >> >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >>testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >>testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >>testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >>testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >>testCompareGTMaskNotShort ops/s 3339509.141 ... > > src/hotspot/share/opto/node.cpp line 1226: > >> 1224: // be added to the IGVN worklist, then the optimization will not >> be applied. >> 1225: // Therefore, add this node into IGVN worklist to make the >> optimization happen. >> 1226: return true; > > Suggestion: > > } else if (n->Opcode() == Op_XorV || n->Opcode() == Op_XorVMask) { > // Condition for removing an unnecessary not() following > // a compare(...) operation. > // The predecessor of n (this XorV or XorVMask) may also be used > // by a useless VectorBox node which will later be eliminated by > // RemoveUseless. Return true to ensure that subgraph > // transformations are performed on n. > return true; Done. Thanks for your review. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059706930
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Thu, 24 Apr 2025 09:39:58 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2265: >> >>> 2263: vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), >>> vmcast_vt); >>> 2264: } >>> 2265: return vmcmp; >> >> It would be preferable if you could kindly re-factor the code such that we >> only call VectorNode::Ideal once at return to comply with aesthetics of >> other idealization routines. > > Ok, I'll change it in the next commit. Done, thanks. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059702378
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` BenchmarkUnitBefore Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Mon, 28 Apr 2025 06:45:58 GMT, Emanuel Peter wrote: >> This is not specifically required on x86, but because this test fails on x86 >> when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the >> sub-graph is like this: >> `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))` >> It is not an optimization pattern supported by this patch because we don't >> know what's the comparison op. > > I would also prefer if you added the IR restrictions rather than the JTREG > requires. > The benefit is that we can still run the tests on all platforms, at least for > result verification. > > Imagine someone adds optimizations to a new platform, but does not know about > this test here. They make a mistake, and there is a bug, leading either to a > crash or wrong result. With the requires, you test would never even run, and > we would not catch it. With the IR applyIf, we would catch the bug. I can make the change, it's not complex, but it is different from what I thought before. I thought that supporting vector was the default behavior, is it right? So when I was doing an architecture-independent feature or optimization, I should just exclude those unsupported cases from the test, so that all potential environments would be tested. If I was doing an architecture- or feature-dependent optimization, then I should limit the test to run only in supported environments. For this case, **the current meaning of @requires is "skip this test when -XX:UseAVX=0 is specified on the x86 architecture, otherwise run the tests".** So if a new architecture (say s390) supports related vector operations in the future, then this test will be run on that platform by default. If all architecture-independent tests are restricted with applyIfCPUFeatureOr, then when we support a new architecture, we will need to modify all tests, otherwise no test will run on this architecture. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063076640
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Mon, 28 Apr 2025 06:47:38 GMT, Emanuel Peter wrote: > Just a drive-by comment for now, I may review this later more fully. > > > I would also prefer if you added the IR restrictions rather than the JTREG > > requires. > > The benefit is that we can still run the tests on all platforms, at least > > for result verification. > > Imagine someone adds optimizations to a new platform, but does not know > > about this test here. They make a mistake, and there is a bug, leading > > either to a crash or wrong result. With the requires, you test would never > > even run, and we would not catch it. With the IR applyIf, we would catch > > the bug. > > Just copy pasting the IR applyIf everywhere is not that much work, and adding > in a new platform later is not really hard either. Thanks! The problem is that when a new platform is added, people may not even know there is a test. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834280547
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]
On Mon, 28 Apr 2025 14:12:02 GMT, Emanuel Peter wrote: > I suppose in that case you can assert that you NEVER get those nodes, because > if you have vectors not supported, they will not show up because of that, and > if you do support vectors, they should be optimized away. This is expected. - If vectors are supported, the test checks that Vector not() is optimized away. - If vectors are not supported, the vector IRs won't generated, so the IR check will pass. And the correctness verification also runs, this is used to ensure that the patch does not break correctness. In the future if vectors are supported, the test also runs without any modification. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065211944
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Mon, 28 Apr 2025 14:13:43 GMT, Emanuel Peter wrote: > > > > > Just a drive-by comment for now, I may review this later more fully. > > > > > > I would also prefer if you added the IR restrictions rather than > > > > > > the JTREG requires. > > > > > > The benefit is that we can still run the tests on all platforms, at > > > > > > least for result verification. > > > > > > Imagine someone adds optimizations to a new platform, but does not > > > > > > know about this test here. They make a mistake, and there is a bug, > > > > > > leading either to a crash or wrong result. With the requires, you > > > > > > test would never even run, and we would not catch it. With the IR > > > > > > applyIf, we would catch the bug. > > > > > > > > > > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, > > > > > and adding in a new platform later is not really hard either. > > > > > > > > > > > > Thanks! The problem is that when a new platform is added, people may > > > > not even know there is a test. > > > > > > > > > @erifan That is true. But we have that problem either way. If you use > > > `@require`, then the person does not realize there is a test AND the test > > > is not run. If you use `applyIf`, the person does not realize there is a > > > test, but it is run at least for result verifiation - and then the person > > > MIGHT realize if the test catches a wrong result / crash. > > > > > > This test will run on new platforms when we use @requires. I explained the > > meaning of the @requires in the previous comment, it only excludes one > > case: when -XX:UseAVX=0 is specified on x86 platforms. > > I see. You should probably add a comment there, to say that you are only > excluding `AVX=0`. But even `UseAVX = 0` would profit from result > verification. @requires is a special comment itself. I feel like it's a bit weird to add a comment to a comment, and I don't think the @requires is hard to understand. If we want to verify the correctness of AVX=0, we have to use ApplyIf. This is back to the beginning of the question, should we use @requires or ApplyIf? Personally I tend to use the former. By the way, I have tested the correctness of AVX=0 locally. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2837292655
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains eight additional commits >> since the last revision: >> >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.156... > > src/hotspot/share/opto/vectornode.cpp line 2231: > >> 2229: } >> 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >> 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || > > Do you plan to extend your testcase / matching logic to cover following > equivalent patterns: > > - compare.xor(maskAll(true)) > - compare.xor(VectorMask.fromLong(SPECIES, -1L)) Hi @jatin-bhateja It is feasible. But I was thinking about whether another solution would be better, which is to turn `VectorMask.fromLong(SPECIES, -1L)` into `MaskAll(true)` in the mid-end. In this way, we don't need to check this pattern in this optimization. What do you think ? - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077316033
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Wed, 7 May 2025 02:10:56 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> BenchmarkUnitBefore Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloatops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShortops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >> testCompareGEMaskNotShortops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShortops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShortops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains eight additional commits since > the last revision: > > - Refactor code > >Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > >1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > >This patch optimizes the following patterns: >For integer types: >``` >(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >=> (VectorMaskCmp src1 src2 ncond) >(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >=> (VectorMaskCmp src1 src2 ncond) >``` >cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >negative comparison of cond. >
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Thu, 8 May 2025 01:49:45 GMT, Xiaohong Gong wrote: >> Yes, that's the right approach. For this PR, I think you can mix some test >> points covering compare, xor(maskAll(true)). > > Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be > better, and that will benefit AArch64 as well, since `MaskAll()` is much more > cheaper than `fromLong()` on AArch64. We can add such a transformation with > another PR. Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2081313613
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains eight additional commits >> since the last revision: >> >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.156... > > src/hotspot/share/opto/vectornode.cpp line 2231: > >> 2229: } >> 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >> 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || > > Do you plan to extend your testcase / matching logic to cover following > equivalent patterns: > > - compare.xor(maskAll(true)) > - compare.xor(VectorMask.fromLong(SPECIES, -1L)) Oh, I didn't think of this case, let me try. Thanks~ - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077026150
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]
On Fri, 2 May 2025 06:16:03 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains six additional commits since >> the last revision: >> >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >>testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >>testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 1... > > src/hotspot/share/opto/vectornode.cpp line 2224: > >> : //=> (VectorMaskCmp src1 src2 ncond) >> 2223: // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond >> is the >> 2224: // negative comparison of cond. > > Suggestion: > > // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt. > // ncond is the negative comparison of cond. > > I was getting lost in all the commas. Done. > src/hotspot/share/opto/vectornode.cpp line 2248: > >> 2246: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || >> 2247: !VectorNode::is_all_ones_vector(in2)) { >> 2248: return VectorNode::Ideal(phase, can_reshape); > > Hmm, so this is really the "else" case, if your optimization does not > succeed, right? > > Wrapping `VectorNode::Ideal` somewhere in the middle is going to make future > optimizations here much harder. > How would they check their conditions next to yours? That would be quite a > mess. > > I suggest you do this: > - `XorVNode::Ideal` does > - checks `in1 == in2` case > - calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it > succeeded, i.e. returns `nullptr`. > - Finally, i.e. none of the optimizations above worked: call > `VectorNode::Ideal` > > Then you pack all your new logic here into > `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is > just what I came up with just now. > > This gives us a much more **modular** design, and it is easier to add another > new optimization to `XorVNode::Ideal`. It is easy to change the precedence of > the optimizations by just changing the order, etc. Done, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646028 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646366
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3]
On Thu, 1 May 2025 07:32:22 GMT, erifan wrote: >> Yes, this discussion is down to `requires` vs `applyIf`. This is my argument >> for `applyIf`, quoted from above, I have not yet seen an argument against it: >> >>> If you use @require, then the person does not realize there is a test AND >>> the test is not run. If you use applyIf, the person does not realize there >>> is a test, but it is run at least for result verifiation - and then the >>> person MIGHT realize if the test catches a wrong result / crash. >> >> In my understanding, `requires` should only be used if the test really >> **requires** a certain platform or feature. That can be because some flags >> are only available under certain platforms for example. But for IR tests, we >> should try to always use `applyIf`, because it allows testing on other >> platforms. >> >> Actually, I filed this RFE a while ago: >> https://bugs.openjdk.org/browse/JDK-8310891 >> We should try to move as many tests from using `requires` to `applyIf`, so >> that we have an increased test coverage. > > @eme64 @jatin-bhateja I have updated the test, thanks for your suggestion. > @erifan thanks for updating the tests! > > Now I had a quick look at the VM code. > > My biggest observation is this: > > Wrapping `VectorNode::Ideal` somewhere in the middle of your new optimization > is going to make future optimizations here much harder. How would they check > their conditions next to yours? That would be quite a mess. > > I suggest you do this: > > * `XorVNode::Ideal` does > > * checks `in1 == in2` case > * calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it > succeeded, i.e. returns `nullptr`. > * ... future optimizations could go here ... > * Finally, i.e. none of the optimizations above worked: call > `VectorNode::Ideal` > > Then you pack all your new logic here into > `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is > just what I came up with just now. > > This gives us a much more **modular** design, and it is easier to add another > new optimization to `XorVNode::Ideal`. It is easy to change the precedence of > the optimizations by just changing the order, etc. > > Examples of this "modular" design: > > * `CMoveNode::Ideal` -> calls `TypeNode::Ideal` and `Ideal_minmax`. > * `StoreBNode::Ideal` -> calls `StoreNode::Ideal_masked_input` and > `StoreNode::Ideal_sign_extended_input` > These are really nice, because you can quickly see what optimizations we > already have, and in which order they are checked. Yes, this is a good idea, I have changed it like this, thanks for your suggestion. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2856811448
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4]
On Fri, 2 May 2025 06:14:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2216: >> >>> 2214: in2->is_predicated_vector()) { >>> 2215: with_predicated = true; >>> 2216: } >> >> Suggestion: >> >> bool with_predicated = is_predicated_vector() || >> in1->is_predicated_vector() || >> in2->is_predicated_vector(); > > Would that not be easier to read? Yes, done, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076645802
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp sr
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Wed, 7 May 2025 11:13:23 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains eight additional commits >> since the last revision: >> >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >>testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >>testCompareGEMaskNotLong ops/s 854496.156... > > test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java > line 40: > >> 38: @Fork(jvmArgs = { "--add-modules=jdk.incubator.vector" }) >> 39: public class MaskCompareNotBenchmark { >> 40: private static final int ARRAYLEN = 4096; > > ARRAYLEN should be configurable @Param. Done. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927310
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5]
On Fri, 9 May 2025 09:37:22 GMT, erifan wrote: >> Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be >> better, and that will benefit AArch64 as well, since `MaskAll()` is much >> more cheaper than `fromLong()` on AArch64. We can add such a transformation >> with another PR. > > Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks! Done, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927027
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 14 May 2025 02:44:14 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> BenchmarkUnitBefore Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloatops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShortops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >> testCompareGEMaskNotShortops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShortops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShortops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains 10 additional commits since the > last revision: > > - Refactor the JTReg tests for compare.xor(maskAll) > >Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > >Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > >1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > >This patch optimizes the following patterns: >For integer types: >``` >(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >=> (VectorMaskCmp src1 src2 ncond) >(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) &
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 28 May 2025 12:12:50 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 10 additional commits since >> the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2233: > >> 2231: if (in2->Opcode() == Op_VectorMaskCast) { >> 2232: in2 = in2->in(1); >> 2233: } > > Wow, this seems to be an addition that is not covered in the patterns you > mention above, right? > But is that even necessary? > I suppose here `in2 = VectorMaskCast(all_ones_vector)`. > Would we not already want to transform this pattern in > `VectorMaskCast::Ideal`, is that not possible and more powerful? Oh yeah, I forgot to mention it in the above comment and commit message. Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`. I agree it's better to do this transformation in `VectorMaskCast::Ideal`. I'll remove this code change and do the `VectorMaskCast` optimization later. Thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113430734
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 28 May 2025 12:16:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2244: >> >>> 2242: // BoolTest doesn't support unsigned comparisons. >>> 2243: BoolTest::mask neg_cond = >>> 2244: (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ >>> 4); >> >> What is the hard-coded `^ 4` here? This whole line looks like we are looking >> at internals of the `VectorMaskCmpNode` or its predicate, and we should >> probably do that in some method there? Or maybe it should be part of the >> `BoolTest(::mask)` interface? > > Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do > `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use it > as `in1_mask_cmp` instead? > What is the hard-coded ^ 4 here? This is to negate the comparison condition. We can't use `BoolTest::negate()` here because the comparison condition may be **unsigned** comparison. Since there's already a `negate()` function in `BoolTest`, so I tend to add a new function `get_negative_predicate` for this into class `VectorMaskCmpNode`. > Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do > as_VectorMaskCmp()? Or could we at least cast it only once, and then use it > as in1_mask_cmp instead? For the first cast, I think you mean if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() || !VectorNode::is_all_ones_vector(in2)) { return nullptr; } To remove one cast, then we have to split the above `if` because `in1` may not be a `VectorMaskCmpNode`. if (in1->Opcode() != Op_VectorMaskCmp) { return nullptr; } VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1; if (in1->outcnt() > 1 || !in1_as_mask_cmp->predicate_can_be_negated() || !VectorNode::is_all_ones_vector(in2)) { return nullptr; } BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() ^ 4); Does this look better to you ? - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113423376
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Fri, 16 May 2025 07:40:53 GMT, erifan wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 10 additional commits since >> the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4... > > Hi, I have updated the code and I'll file the patch to convert > `VectorMask.fromLong(SPECIES, -1)` to `maskAll()` soon, I'll cover this test > case in that patch. Would you please help review the patch again, thanks! > @erifan Looks like you really improved things, nice work! I have some more > comments below :) I will modify the code according to your suggestions and commit again after all the test passes locally. Thanks for you careful review! - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2918641646
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Thu, 29 May 2025 01:44:49 GMT, Xiaohong Gong wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java >> line 49: >> >>> 47: private static final VectorSpecies L_SPECIES = >>> LongVector.SPECIES_MAX; >>> 48: private static final VectorSpecies F_SPECIES = >>> FloatVector.SPECIES_MAX; >>> 49: private static final VectorSpecies D_SPECIES = >>> DoubleVector.SPECIES_MAX; >> >> Are you taking `SPECIES_MAX` on purpose here, or could we take >> `SPECIES_PREFERRED` instead? >> @jatin-bhateja What is the best to do in these tests? I suppose best would >> be to test with all vector lengths... > > Thanks for pointing out this @eme64 ! Per my understanding, `SPECIES_MAX` is > almost the same with `SPECIES_PREFERRED` in this case which are all specified > to the max vector size of a hardware. Since the max vector size is different > on different architectures, not all vector lengths are supported to be > intrinsified on a specified architecture like AArch64, especially the SVE > arch with different vector register size. Hence, just testing the max species > makes sense to me as this is a mid-end common transformation. Changed to use `ofLargestShape()` because on x64 the max vector length is related to data types. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128383805
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 09:32:15 GMT, erifan wrote: > > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { > > return mask(_test^4); }` I think you should use that instead :) > > Indeed, I hadn't noticed that, thank you. Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943500455
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 09:24:10 GMT, Emanuel Peter wrote: > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { > return mask(_test^4); }` I think you should use that instead :) Indeed, I hadn't noticed that, thank you. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943449327
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Thu, 29 May 2025 07:55:06 GMT, erifan wrote: >> Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do >> `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use >> it as `in1_mask_cmp` instead? > >> What is the hard-coded ^ 4 here? > > This is to negate the comparison condition. We can't use `BoolTest::negate()` > here because the comparison condition may be **unsigned** comparison. Since > there's already a `negate()` function in `BoolTest`, so I tend to add a new > function `get_negative_predicate` for this into class `VectorMaskCmpNode`. > >> Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do >> as_VectorMaskCmp()? Or could we at least cast it only once, and then use it >> as in1_mask_cmp instead? > > For the first cast, I think you mean > > if (in1->Opcode() != Op_VectorMaskCmp || > in1->outcnt() > 1 || > !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > return nullptr; > } > > To remove one cast, then we have to split the above `if` because `in1` may > not be a `VectorMaskCmpNode`. > > if (in1->Opcode() != Op_VectorMaskCmp) { > return nullptr; > } > VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1; > if (in1->outcnt() > 1 || > !in1_as_mask_cmp->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > return nullptr; > } > BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() > ^ 4); > > Does this look better to you ? For now I kept the current approach, as I feel it's a little more compact. Thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128358563
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 28 May 2025 12:28:20 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: >> >>> 235: // Byte tests >>> 236: @Test >>> 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" }, >> >> Could you still assert the presence of some other vectors, just to make sure >> we are indeed getting vectors here? > > Not testing for any present vectors makes me a little nervous: what if we > just don't get any vectors because inlining fails or something else silly > happens? Done. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128379987
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 28 May 2025 12:18:15 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 10 additional commits since >> the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2251: > >> 2249: predicate_node, vt); >> 2250: if (vmcast_vt != nullptr) { >> 2251: // We optimized out an VectorMaskCast, and in order to ensure type > > Suggestion: > > // We optimized out a VectorMaskCast, and in order to ensure type Done. > src/hotspot/share/opto/vectornode.cpp line 2253: > >> 2251: // We optimized out an VectorMaskCast, and in order to ensure type >> 2252: // correctness, we need to regenerate one. VectorMaskCast will be >> encoded as >> 2253: // empty for types with the same size. > > Suggestion: > > // a no-op (identity function) for types with the same size. > > Or what do you mean by "empty"? `TOP`? All zeros? I mean `no-op`. Done, thanks. > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 96: > >> 94: Generator lGen = RD.uniformLongs(Long.MIN_VALUE, >> Long.MAX_VALUE); >> 95: Generator fGen = RD.uniformFloats(Float.MIN_VALUE, >> Float.MAX_VALUE); >> 96: Generator dGen = RD.uniformDoubles(Double.MIN_VALUE, >> Double.MAX_VALUE); > > Are you sure you only want to draw from the uniform distribution? > If you don't super care about the distribution, please just take > `RD.ints/longs/floats/doubles()`. > That way, you get all sorts of distributions, and also some that include NaN > values etc. I think that would be important for your float cmp cases, no? For float and double, we have to use the uniform distribution, because we have to make sure `NAN` is not generated. I added some comments about the reasons. For other types, changed to use `RD.ints/longs`. We have covered the special cases like +/- Inf, NaN. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128376851 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r212837 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128375032
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: >>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const >>> > { return mask(_test^4); }` I think you should use that instead :) >>> >>> Indeed, I hadn't noticed that, thank you. >> >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because > that is where the exact implementation of the enum values is. In that context > it is easier to see why `^4` does the negation. And imagine we were ever to > change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a > comment that both signed and unsigned is supported. @eme64 @XiaohongGong Your comment has been addressed, thanks for your review! - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3004213740
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 04:56:36 GMT, Emanuel Peter wrote: >> erifan has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/subnode.hpp line 333: > >> 331: mask negate( ) const { return mask(_test^4); } >> 332: // Return the negative mask for the given mask, for both signed and >> unsigned comparison. >> 333: static mask negate_mask(mask btm) { return mask(btm^4); } > > Suggestion: > > static mask negate_mask(mask btm) { return mask(btm ^ 4); } > > > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > >> Use spaces around operators, especially comparisons and assignments. >> (Relaxable for boolean expressions and high-precedence operators in classic >> math-style formulas.) Done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2226: > >> 2224: >> 2225: const TypeVect* vector_mask_cast_vt = nullptr; >> 2226: // in1 should be single used, otherwise the optimization may be >> unprofitable. > > Suggestion: > > // in1 should only have a single use, otherwise the optimization may be > unprofitable. Done > src/hotspot/share/opto/vectornode.cpp line 2237: > >> 2235: !VectorNode::is_all_ones_vector(in2)) { >> 2236: return nullptr; >> 2237: } > > Similarly here: do you have tests for these conditions, that we do not > optimize if any of these fail? Added some negative tests for these conditions > src/hotspot/share/opto/vectornode.cpp line 2239: > >> 2237: } >> 2238: >> 2239: BoolTest::mask neg_cond = >> BoolTest::negate_mask(((VectorMaskCmpNode*) in1)->get_predicate()); > > Suggestion: > > BoolTest::mask neg_cond = > BoolTest::negate_mask((in1->as_VectorMaskCmp())->get_predicate()); > > Does that compile? It would be prefereable. Yes, done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2243: > >> 2241: const TypeVect* vt = in1->as_Vector()->vect_type(); >> 2242: Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), >> 2243: predicate_node, vt); > > Suggestion: > > Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), > predicate_node, vt); > > Alignment Done > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 244: > >> 242: testCompareMaskNotByte(VectorOperators.EQ, m -> m.not()); >> 243: testCompareMaskNotByte(VectorOperators.EQ, m -> >> m.xor(B_SPECIES.maskAll(true))); >> 244: } > > Could it happen that the verification is inlined in the test body? > > Currently, the verification is probably inlined, but the code there is not > vectorized. But what if one day the auto-vectorizer is smart enough and > vectorizes it, and creates vectors that we currently check `count ...= 0`? > > At least, you could ensure that the verification does not get inlined, with > `@DontInline`. > > What do you think? Make sense, done. > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 692: > >> 690: TestFramework testFramework = new TestFramework(); >> 691: testFramework.addFlags("--add-modules=jdk.incubator.vector"); >> 692: testFramework.setDefaultWarmup(1); > > The default is `2000` is that not enough? > > Increasing it means the test runs slower, here probably about 5x. Yes, not enough, changed to 5000. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166343933 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166344643 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346170 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346620 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346885 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347867 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166352227
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 08:51:50 GMT, Xiaohong Gong wrote: >> erifan has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/vectornode.cpp line 2237: > >> 2235: !VectorNode::is_all_ones_vector(in2)) { >> 2236: return nullptr; >> 2237: } > > This part can be refined more clearly: > > // Swap and put all_ones_vector to right > if (!VectorNode::is_all_ones_vector(in1)) { >swap(in1, in2); > } > > // uncast mask > bool need_cast = false; > if (in1->Opcode() == Op_VectorMaskCast && > in1->outcnt() == 1) { > assert(in1->bottom_type()->eq(bottom_type()), ""); > need_cast = true; > in1 = in1->in(1); > } > > // Check mask cmp pattern > if (in1->Opcode() != Op_VectorMaskCmp || > in1->outcnt() > 1 || > !in1->as_VectorMaskCmp()->predicate_can_be_negated()) { > return nullptr; > } > > // Convert VectorMaskCmp + not > > > // Cast back > if (need_cast) { > res = new VectorMaskCastNode(phase->transform(res), vect_type()); > } Done - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166351241
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 05:09:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2227: >> >>> 2225: const TypeVect* vector_mask_cast_vt = nullptr; >>> 2226: // in1 should be single used, otherwise the optimization may be >>> unprofitable. >>> 2227: if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1 && >>> in1->in(1)->Opcode() == Op_VectorMaskCmp) { >> >> `in1->in(1)->Opcode() == Op_VectorMaskCmp` >> Is this check here even necessary? Because we check it below again, right? >> `in1->Opcode() != Op_VectorMaskCmp` > > Btw: do you have a test where `in1->outcnt() > 1`, and you check that the > optimization does not happen with an IR test? Refactored the code, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166345311
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 09:07:12 GMT, erifan wrote: >> Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I >> would suggest :) > > Make sense! Done >>> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does >>> > that make sense? >> >>> The bottom types of float and double vector masks are casted to int and >>> long. Seems this is by design? So this is correct. >> >> This is a `float` test. What is the bottom type for the mask here? > > Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double > check them all. Done - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347264 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166349306
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 09:09:57 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2221: >> >>> 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast >>> to in1. >>> 2220: if (in2->Opcode() == Op_VectorMaskCmp || >>> 2221: (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == >>> Op_VectorMaskCmp)) { >> >> We may need to consider cases that a `VectorMaskCast` is generated between >> `compare + not`, such as `compare + cast + not`. For such cases, the element >> size maybe different for input and output of a `cast`. Although this >> patch's intention is not for the latter pattern, current change have also >> covered it well. Could you please add more test/jmh for all kinds of `cast` >> pattern here? And I think the scope of this PR could be also extended to >> `compare + cast + not`. WDYT? > > Good catch, I'll add more tests and check the correctness. Thanks~ Added more tests for this, thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166350143
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Tue, 10 Jun 2025 02:38:29 GMT, Fei Yang wrote: > FYI: I submitted to testing in QEMU-system / Ubuntu 25.04 (fastdebug jdk > build and 256-bit RVV) and I see `compiler/vectorization`, > `compiler/vectorapi` and `jdk_vector` tests are passing. Thank you very much! - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2957560924
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: > > Oh I think we still cannot use `BoolTest::negate`, because we cannot > > instantiate a `BoolTest` object with **unsigned** comparison. > > `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because > that is where the exact implementation of the enum values is. In that context > it is easier to see why `^4` does the negation. And imagine we were ever to > change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a > comment that both signed and unsigned is supported. Make sense, I'll update later, thanks. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948297917
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request incrementally with one additional commit since the last revision: Support negating unsigned comparison for BoolTest::mask Added a static method `negate_mask(mask btm)` into BoolTest class to negate both signed and unsigned comparison. - Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/ebbcc405..f51bf722 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06-07 Stats: 6 lines in 3 files changed: 2 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Fri, 6 Jun 2025 07:01:58 GMT, erifan wrote: > > > Oh I think we still cannot use `BoolTest::negate`, because we cannot > > > instantiate a `BoolTest` object with **unsigned** comparison. > > > `BoolTest::negate` is a non-static function. > > > > > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, > > because that is where the exact implementation of the enum values is. In > > that context it is easier to see why `^4` does the negation. And imagine we > > were ever to change the enum values, then it would be harder to find your > > code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a > > comment that both signed and unsigned is supported. > > Make sense, I'll update later, thanks. @eme64 your comment is addressed, thanks for your suggestion! - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948832452
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Thu, 29 May 2025 08:00:05 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2233: >> >>> 2231: if (in2->Opcode() == Op_VectorMaskCast) { >>> 2232: in2 = in2->in(1); >>> 2233: } >> >> Wow, this seems to be an addition that is not covered in the patterns you >> mention above, right? >> But is that even necessary? >> I suppose here `in2 = VectorMaskCast(all_ones_vector)`. >> Would we not already want to transform this pattern in >> `VectorMaskCast::Ideal`, is that not possible and more powerful? > > Oh yeah, I forgot to mention it in the above comment and commit message. > > Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`. I agree it's > better to do this transformation in `VectorMaskCast::Ideal`. I'll remove this > code change and do the `VectorMaskCast` optimization later. Thanks! Done. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128344233
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6]
On Wed, 28 May 2025 12:03:48 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 10 additional commits since >> the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >>1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >>2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >>This patch optimizes the following patterns: >>For integer types: >>``` >>(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >>=> (VectorMaskCmp src1 src2 ncond) >>(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >>=> (VectorMaskCmp src1 src2 ncond) >>``` >>cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >>negative comparison of cond. >> >>For float and double types: >>``` >>(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >>=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >>``` >>cond can be eq or ne. >> >>Benchmarks on Nvidia Grace machine with 128-bit SVE2: >>With option `-XX:UseSVE=2`: >>``` >>Benchmark UnitBefore Score Error After >> Score Error Uplift >>testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >>testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >>testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >>testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >>testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >>testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >>testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2213: > >> 2211: Node* in1 = in(1); >> 2212: Node* in2 = in(2); >> 2213: // Transformations for predicated IRs are not supported for now. > > Suggestion: > > // Transformations for predicated vectors are not supported for now. Done. > src/hotspot/share/opto/vectornode.cpp line 2215: > >> 2213: // Transformations for predicated IRs are not supported for now. >> 2214: if (is_predicated_vector() || in1->is_predicated_vector() || >> 2215: in2->is_predicated_vector()) { > > I would either put all on the same line, or all on separate lines. Done. > src/hotspot/share/opto/vectornode.cpp line 2219: > >> 2217: } >> 2218: >> 2219: // XorV/XorVMask is commutative, swap >> VectorMaskCmp/Op_VectorMaskCast to in1. > > Suggestion: > > // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. > > Would look a little cleaner, and you did also not write `Op_VectorMaskCmp` > either ;) Done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2225: > >> 2223: } >> 2224: >> 2225: const TypeVect* vmcast_vt = nullptr; > > Suggestion: > > const TypeVect* vector_mask_cast_vt = nullptr; > > I think it would not hurt to write it out. Otherwise, the reader always has > to reconstruct that in their head. Done. > src/hotspot/share/opto/vectornode.cpp line 2230: > >> 2228: vmcast_vt = in1->as_Vector()->vect_type(); >> 2229: in1 = in1->in(1); >> 2230: } > > Add a comment why you check `in1->outcnt() == 1`. Done. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341063 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128340484 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341959 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342468 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342908
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Addressed some review comments - Merge branch 'master' into JDK-8354242 - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - ... and 2 more: https://git.openjdk.org/jdk/compare/71938fba...ebbcc405 - Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/f2f71e34..ebbcc405 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=05-06 Stats: 146911 lines in 2345 files changed: 87334 ins; 41007 del; 18570 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> BenchmarkUnitBefore Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloatops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShortops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >> testCompareGEMaskNotShortops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShortops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShortops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit > since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. Hi, any further comments? - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2961024225
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 08:30:51 GMT, Emanuel Peter wrote: >>> You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does >>> that make sense? >> >> The bottom types of `float` and `double` vector masks are casted to `int` >> and `long`. Seems this is by design? So this is correct. >> >> As for `control test`, yes for now I didn't add any such kind of test, >> because I personally think it's unnecessary. For specific code, it won't >> trigger this optimization now, but new optimizations in the future may cause >> this test to fail. >> >> Anyway I've tested the case where `vectormaskcmp` is multi used locally, and >> this optimization won't be triggered. >> >> Do you think it's necessary to add such a control test? > >> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does >> > that make sense? > >> The bottom types of float and double vector masks are casted to int and >> long. Seems this is by design? So this is correct. > > This is a `float` test. What is the bottom type for the mask here? Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double check them all. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139582391
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 07:43:55 GMT, Emanuel Peter wrote: >> `VectorOperators.XXX` is not compile time constants, we can't use `switch` >> here. > > Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I > would suggest :) Make sense! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139610469
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 05:23:12 GMT, Emanuel Peter wrote: >> erifan has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 158: > >> 156: } else if (op == VectorOperators.UGT) { >> 157: Asserts.assertEquals(compareUnsigned(a, b) <= 0, r); >> 158: } > > Please refactor it as a `switch`. And add a `default` case where you throw > some `RuntimeException`. just to make sure we are not missing anything :) `VectorOperators.XXX` is not compile time constants, we can't use `switch` here. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139418759
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 08:47:56 GMT, Xiaohong Gong wrote: >> erifan has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/vectornode.cpp line 2221: > >> 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast >> to in1. >> 2220: if (in2->Opcode() == Op_VectorMaskCmp || >> 2221: (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == >> Op_VectorMaskCmp)) { > > We may need to consider cases that a `VectorMaskCast` is generated between > `compare + not`, such as `compare + cast + not`. For such cases, the element > size maybe different for input and output of a `cast`. Although this patch's > intention is not for the latter pattern, current change have also covered it > well. Could you please add more test/jmh for all kinds of `cast` pattern > here? And I think the scope of this PR could be also extended to `compare + > cast + not`. WDYT? Good catch, I'll add more tests and check the correctness. Thanks~ - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139615887
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> BenchmarkUnitBefore Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloatops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShortops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.12291.37 >> testCompareGEMaskNotShortops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShortops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShortops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit > since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. Hi @RealFYang , would you mind helping test this patch on a risc-v machine with rvv? This optimization also applies to rvv, but I don't have a risc-v test environment. - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2955291395
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8]
On Wed, 11 Jun 2025 05:31:00 GMT, Emanuel Peter wrote: > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does > that make sense? The bottom types of `float` and `double` vector masks are casted to `int` and `long`. Seems this is by design? So this is correct. As for `control test`, yes for now I didn't add any such kind of test, because I personally think it's unnecessary. For specific code, it won't trigger this optimization now, but new optimizations in the future may cause this test to fail. Anyway I've tested the case where `vectormaskcmp` is multi used locally, and this optimization won't be triggered. Do you think it's necessary to add such a control test? - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139512969
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v10]
On Tue, 8 Jul 2025 11:42:02 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 17 additional commits since >> the last revision: >> >> - Align indentation >> - Merge branch 'master' into JDK-8354242 >> - Address more comments >> >>ATT. >> - Merge branch 'master' into JDK-8354242 >> - Support negating unsigned comparison for BoolTest::mask >> >>Added a static method `negate_mask(mask btm)` into BoolTest class to >>negate both signed and unsigned comparison. >> - Addressed some review comments >> - Merge branch 'master' into JDK-8354242 >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/04bd77d0...db78dc43 > > src/hotspot/share/opto/vectornode.cpp line 2241: > >> 2239: in1->outcnt() != 1 || >> 2240: !(in1->as_VectorMaskCmp())->predicate_can_be_negated() || >> 2241: !VectorNode::is_all_ones_vector(in2)) { > > Suggestion: > > !in1->as_VectorMaskCmp()->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > > Remove the indentation again, and the superfluous brackets too ;) Done - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2194130835
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
On Tue, 8 Jul 2025 11:42:18 GMT, Emanuel Peter wrote: >> Oh wow, my bad. I misunderstood the brackets! >> >> Instead of: >> >> !(in1->as_VectorMaskCmp())->predicate_can_be_negated() || >> !VectorNode::is_all_ones_vector(in2)) { >> >> I read: >> >> !(in1->as_VectorMaskCmp()->predicate_can_be_negated() || >> !VectorNode::is_all_ones_vector(in2))) { >> >> That confused me a lot... absolutely my bad. >> >> Well actually then my indentation suggestion was terrible! > > I made a new suggestion below. > A code comment would be helpful for this case. I updated the comment above the code a bit. As for why predicate need to be negatable, it's straightforward, the key of this optimization is to change predicate condition into negative predicate condition. And in `predicate_can_be_negated`, there's a comment explaining when predicate can't be negated. > I made a new suggestion below. Done. > That confused me a lot... absolutely my bad. Well actually then my indentation suggestion was terrible! No problem. I'm a newbie in the JDK community, so generally I think your suggestions are valuable.Thanks for your review! - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2194130234
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v11]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request incrementally with one additional commit since the last revision: Update the code comment - Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/db78dc43..04142a19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=09-10 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
On Mon, 7 Jul 2025 09:32:42 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2243: >> >>> 2241: !VectorNode::is_all_ones_vector(in2)) { >>> 2242: return nullptr; >>> 2243: } >> >> Suggestion: >> >> if (in1->Opcode() != Op_VectorMaskCmp || >> in1->outcnt() != 1 || >> !(in1->as_VectorMaskCmp())->predicate_can_be_negated() || >> !VectorNode::is_all_ones_vector(in2)) { >> return nullptr; >> } >> >> Indentation for clarity. >> >> Currently, you exiting if one of these is the case: >> 1. Not `MaskCmp` >> 2. More than one use >> 3. predicate cannot be negated AND the vector is all ones. Can you explain >> this condition? Do you have tests for cases: >>- predicate negatable and vector not all ones >>- predircate not negatable and vector not all ones >>- predicate negatable and vector all ones >>- predicate not negatable and vectors all ones >> >> Why do you guard against `VectorNode::is_all_ones_vector(in2)` at all? >> >> The condition for 3. is easy to get wrong, so good testing is important here >> :) > > The current testing status for the conditions you listed: >> 1. Not MaskCmp. > > **No test for it, tested locally**, Because I think this condition is too > straightforward. > >> 2. More than one use. > > **Tested**, see `VectorMaskCompareNotTest.java line 1118`. > >> predicate negatable and vector not all ones. > > **Tested**, see `VectorMaskCompareNotTest.java line 1126`. > >> predicate not negatable and vector not all ones. > > **No test for it**, because we have tests for `predicate not negatable` or > `vector not all ones`. If either is `false`, return nullptr. > >> predicate negatable and vector all ones. > > **A lot of tests for it**. For example `VectorMaskCompareNotTest.java line > 1014`. > >> predicate not negatable and vectors all ones. > > **Tested**, see `VectorMaskCompareNotTest.java line 1222`. > Indentation for clarity. Done. I think we have enough negative tests. Please take a look at this PR, thanks~ - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2191550171
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v10]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Align indentation - Merge branch 'master' into JDK-8354242 - Address more comments ATT. - Merge branch 'master' into JDK-8354242 - Support negating unsigned comparison for BoolTest::mask Added a static method `negate_mask(mask btm)` into BoolTest class to negate both signed and unsigned comparison. - Addressed some review comments - Merge branch 'master' into JDK-8354242 - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - ... and 7 more: https://git.openjdk.org/jdk/compare/c2a2adc8...db78dc43 - Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/5ebdc572..db78dc43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=08-09 Stats: 9269 lines in 462 files changed: 4528 ins; 2873 del; 1868 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
On Tue, 8 Jul 2025 09:49:30 GMT, erifan wrote: >> Thanks for your answers @erifan ! >> >> Can you please answer these as well? >> >>> predicate cannot be negated AND the vector is all ones. Can you explain >>> this condition? >> >> A code comment would be helpful for this case. I'm a little bit struggling >> to understand the bracket/negation here. >> >>> Why do you guard against VectorNode::is_all_ones_vector(in2) at all? >> >> Is this necessary? Why? > >> predicate cannot be negated AND the vector is all ones. Can you explain this >> condition? > > Ok, I'll add a comment for it. > >> Why do you guard against VectorNode::is_all_ones_vector(in2) at all? > > Because one of the nodes in the supported patterns by this PR needs to be > `MaskAll` or `Replicate`. And this function `VectorNode::is_all_ones_vector` > just meets our check for `MaskAll` and `Replicate`. Actually I don't quite > understand your question. I have two understandings: > 1. Not all nodes that `VectorNode::is_all_ones_vector` returns true are > `MaskAll` or `Replicate`, but other nodes that do not meet the conditions. > 2. Here, it does not need to be a vector with every bit 1, it only needs to > be an `all true` mask. > > Which one do you mean? Or something else? Thanks~ The purpose of this PR is optimizing the following kinds of patterns: XXXVector va, vb; va.compare(EQ, vb).not() And the generated IR of `va.compare(EQ, vb).not()` is `(XorVMask (VectorMaskCmp va vb EQ) (MaskAll -1))`. On platforms like aarch64 NEON, `MaskAll` is `Replicate`. And `MaskAll` and `Replicate` are both all ones vectors, so we do this check `VectorNode::is_all_ones_vector(in2)` - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2192087590
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
On Mon, 7 Jul 2025 06:19:15 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or >> a rebase. The incremental webrev excludes the unrelated changes brought in >> by the merge/rebase. The pull request contains 15 additional commits since >> the last revision: >> >> - Address more comments >> >>ATT. >> - Merge branch 'master' into JDK-8354242 >> - Support negating unsigned comparison for BoolTest::mask >> >>Added a static method `negate_mask(mask btm)` into BoolTest class to >>negate both signed and unsigned comparison. >> - Addressed some review comments >> - Merge branch 'master' into JDK-8354242 >> - Refactor the JTReg tests for compare.xor(maskAll) >> >>Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >>Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >>optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - ... and 5 more: https://git.openjdk.org/jdk/compare/8e600a2f...5ebdc572 > > src/hotspot/share/opto/vectornode.cpp line 2243: > >> 2241: !VectorNode::is_all_ones_vector(in2)) { >> 2242: return nullptr; >> 2243: } > > Suggestion: > > if (in1->Opcode() != Op_VectorMaskCmp || > in1->outcnt() != 1 || > !(in1->as_VectorMaskCmp())->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > return nullptr; > } > > Indentation for clarity. > > Currently, you exiting if one of these is the case: > 1. Not `MaskCmp` > 2. More than one use > 3. predicate cannot be negated AND the vector is all ones. Can you explain > this condition? Do you have tests for cases: >- predicate negatable and vector not all ones >- predircate not negatable and vector not all ones >- predicate negatable and vector all ones >- predicate not negatable and vectors all ones > > Why do you guard against `VectorNode::is_all_ones_vector(in2)` at all? > > The condition for 3. is easy to get wrong, so good testing is important here > :) The current testing status for the conditions you listed: > 1. Not MaskCmp. **No test for it, tested locally**, Because I think this condition is too straightforward. > 2. More than one use. **Tested**, see `VectorMaskCompareNotTest.java line 1118`. > predicate negatable and vector not all ones. **Tested**, see `VectorMaskCompareNotTest.java line 1126`. > predicate not negatable and vector not all ones. **No test for it**, because we have tests for `predicate not negatable` or `vector not all ones`. If either is `false`, return nullptr. > predicate negatable and vector all ones. **A lot of tests for it**. For example `VectorMaskCompareNotTest.java line 1014`. > predicate not negatable and vectors all ones. **Tested**, see `VectorMaskCompareNotTest.java line 1222`. - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2189495935
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
On Tue, 8 Jul 2025 08:56:44 GMT, Emanuel Peter wrote: > predicate cannot be negated AND the vector is all ones. Can you explain this > condition? Ok, I'll add a comment for it. > Why do you guard against VectorNode::is_all_ones_vector(in2) at all? Because one of the nodes in the supported patterns by this PR needs to be `MaskAll` or `Replicate`. And this function `VectorNode::is_all_ones_vector` just meets our check for `MaskAll` and `Replicate`. Actually I don't quite understand your question. I have two understandings: 1. Not all nodes that `VectorNode::is_all_ones_vector` returns true are `MaskAll` or `Replicate`, but other nodes that do not meet the conditions. 2. Here, it does not need to be a vector with every bit 1, it only needs to be an `all true` mask. Which one do you mean? Or something else? Thanks~ - PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2192019309
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9]
> This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option > `-XX:UseSVE=2`: > > Benchmark UnitBefore Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDoubleops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 > 1177811.493 521.12291.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 > 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 > 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 > 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 > 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 > 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 > 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 > 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 > 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 > 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 > 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 > 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Address more comments ATT. - Merge branch 'master' into JDK-8354242 - Support negating unsigned comparison for BoolTest::mask Added a static method `negate_mask(mask btm)` into BoolTest class to negate both signed and unsigned comparison. - Addressed some review comments - Merge branch 'master' into JDK-8354242 - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - ... and 5 more: https://git.openjdk.org/jdk/compare/c8050dff...5ebdc572 - Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/f51bf722..5ebdc572 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07-08 Stats: 36932 lines in 979 files changed: 21941 ins; 10041 del; 4950 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: >>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const >>> > { return mask(_test^4); }` I think you should use that instead :) >>> >>> Indeed, I hadn't noticed that, thank you. >> >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because > that is where the exact implementation of the enum values is. In that context > it is easier to see why `^4` does the negation. And imagine we were ever to > change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a > comment that both signed and unsigned is supported. Hi @eme64 @jatin-bhateja , would you mind taking another look of this PR, thanks~ - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3031109432
Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7]
On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: >>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const >>> > { return mask(_test^4); }` I think you should use that instead :) >>> >>> Indeed, I hadn't noticed that, thank you. >> >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > >> Oh I think we still cannot use `BoolTest::negate`, because we cannot >> instantiate a `BoolTest` object with **unsigned** comparison. >> `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because > that is where the exact implementation of the enum values is. In that context > it is easier to see why `^4` does the negation. And imagine we were ever to > change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a > comment that both signed and unsigned is supported. Hi @eme64 @jatin-bhateja , could you please take a look at this patch? Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3111851722