On Wed, 9 Jul 2025 06:08:33 GMT, erifan <[email protected]> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> Benchmark                    Unit    Before          Score Error     After   
>>         Score Error     Uplift
>> testCompareEQMaskNotByte     ops/s   7912127.225     2677.289518     
>> 10266136.26     8955.008548     1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799     446.963779      
>> 1179760.772     448.031844      1.33
>> testCompareEQMaskNotFloat    ops/s   1765045.787     682.332214      
>> 2359520.803     896.305743      1.33
>> testCompareEQMaskNotInt              ops/s   1787221.411     977.743935      
>> 2353952.519     960.069976      1.31
>> testCompareEQMaskNotLong     ops/s   895297.1974     673.44808       
>> 1178449.02      323.804205      1.31
>> testCompareEQMaskNotShort    ops/s   3339987.002     3415.2226       
>> 4712761.965     2110.862053     1.41
>> testCompareGEMaskNotByte     ops/s   7907615.16      4094.243652     
>> 10251646.9      9486.699831     1.29
>> testCompareGEMaskNotInt              ops/s   1683738.958     4233.813092     
>> 2352855.205     1251.952546     1.39
>> testCompareGEMaskNotLong     ops/s   854496.1561     8594.598885     
>> 1177811.493     521.1229        1.37
>> testCompareGEMaskNotShort    ops/s   3341860.309     1578.975338     
>> 4714008.434     1681.10365      1.41
>> testCompareGTMaskNotByte     ops/s   7910823.674     2993.367032     
>> 10245063.58     9774.75138      1.29
>> testCompareGTMaskNotInt              ops/s   1673393.928     3153.099431     
>> 2353654.521     1190.848583     1.4
>> testCompareGTMaskNotLong     ops/s   849405.9159     2432.858159     
>> 1177952.041     359.96413       1.38
>> testCompareGTMaskNotShort    ops/s   3339509.141     3339.976585     
>> 4711442.496     2673.364893     1.41
>> testCompareLEMaskNotByte     ops/s   7911340.004     3114.69191      
>> 10231626.5      27134.20035     1.29
>> testCompareLEMaskNotInt              ops/s   1675812.113     1340.969885     
>> 2353255.341     1452.4522       1.4
>> testCompareLEMaskNotLong     ops/s   848862.8036     6564.841731     
>> 1177763.623     539.290106      1.38
>> testCompareLEMaskNotShort    ops/s   3324951.54      2380.29473      
>> 4712116.251     1544.559684     1.41
>> testCompareLTMaskNotByte     ops/s   7910390.844     2630.861436     
>> 10239567.69     6487.441672     1.29
>> testCompareLTMaskNotInt              ops/s   16721...
>
> erifan has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Update the code comment

Looks much better, thanks for the updates!

I have another small list of suggestions :)

src/hotspot/share/opto/vectornode.cpp line 2243:

> 2241:   if (in1->Opcode() != Op_VectorMaskCmp ||
> 2242:       in1->outcnt() != 1 ||
> 2243:       !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||

Suggestion:

      !in1->as_VectorMaskCmp()->predicate_can_be_negated() ||

Brackets are unnecessary, and rather make it harder to read.

src/hotspot/share/opto/vectornode.cpp line 2277:

> 2275:     res = VectorNode::Ideal(phase, can_reshape);
> 2276:   }
> 2277:   return res;

What if someone comes and wants to add yet another optimization before 
`VectorNode::Ideal`? Your code layout would give us deeper and deeper nesting. 
I suggest flattening it like this:
Suggestion:


  Node* res = Ideal_XorV_VectorMaskCmp(phase, can_reshape);
  if (res != nullptr) { return res; }

  return VectorNode::Ideal(phase, can_reshape);

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 911:

> 909:         testCompareMaskNotLong(L_SPECIES_FOR_CAST, VectorOperators.UGE, 
> (m) -> { return m.cast(I_SPECIES_FOR_CAST).not(); });
> 910:         verifyResultsLong(L_SPECIES_FOR_CAST, VectorOperators.UGE);
> 911:     }

You have some cast in here, and in similar tests.
Can you add an IR rule to check if we do or do not have the expected casts?

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 1007:

> 1005:         testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, 
> fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
> 1006:         verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
> 1007:     }

Do you have test cases for the cases other than `EQ` and `NE`? After all, we 
don't that someone accidentally messes with the logic you implemented later and 
we don't notice the bug ;)

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java 
line 351:

> 349:     public void testCompareULEMaskNotLong() {
> 350:         testCompareMaskNotLong(VectorOperators.ULE);
> 351:     }

You could consider making the operator a `@Param` next time.

There are multiple tricks to do that:
- `test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java` 
using `MethodHandles.constant`
- Some inner class that has a static final, which is initialized from the 
non-final `@Param` value.
- Probably even `StableValue` would work, but I have not yet experimented with 
it.

It would be nice if we could do the same with the primitive types, but that's 
probably not going to work as easily.

Really just an idea for next time.

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java 
line 366:

> 364:     public void testCompareNEMaskNotFloat() {
> 365:         testCompareMaskNotFloat(VectorOperators.NE);
> 366:     }

You could still add the other comparisons as well, so we can see the 
performance difference. Very optional, feel free to ignore this suggestion.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3201347660
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333480061
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333418237
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333510278
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333503735
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333545924
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333516350

Reply via email to