On Mon, 15 Dec 2025 07:30:10 GMT, Jatin Bhateja <[email protected]> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 1062:
>> 
>>> 1060:     if (!in1->isa_Vector()) {
>>> 1061:       break;
>>> 1062:     }
>> 
>> Hi, @jatin-bhateja, I didn't quite understand what you meant. I'm not sure 
>> if you mistook `isa_Vector` for `isa_vectormask`. Checking `isa_Vector` here 
>> is to ensure that `in1` is a `VectorNode`, so that it calls the `as_Vector` 
>> function.
>
> Correct, I am seeing a different behaviour b/w UseAVX=2 and UseAVX=3 for 
> following kernel. Not related to your new code but due to other sideeffect. 
> kindly have a look.
> 
> 
>   public static final VectorSpecies<Float> FSP = 
> FloatVector.SPECIES_PREFERRED;
> 
>   public static long micro(long ctr) {
>       VectorMask<Float> mask = VectorMask.fromLong(FSP, 15);
>       return mask.toLong();
>   }
> 
> 
> TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=3 -Xbatch 
> -XX:-TieredCompilation 
> -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN
> G -cp . testmcast
> CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 
> 'BEFORE_MATCHING'
> AFTER: BEFORE_MATCHING
>    0  Root  === 0 368  [[ 0 1 3 25 ]] inner
>    3  Start  === 3 0  [[ 3 5 6 7 8 9 ]]  #{0:control, 1:abIO, 2:memory, 
> 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half}
>    5  Parm  === 3  [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9)
>    6  Parm  === 3  [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9)
>    7  Parm  === 3  [[ 368 ]] Memory  Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: 
> testmcast::micro @ bci:-1 (line 9)
>    8  Parm  === 3  [[ 368 ]] FramePtr !jvms: testmcast::micro @ bci:-1 (line 
> 9)
>    9  Parm  === 3  [[ 368 ]] ReturnAdr !jvms: testmcast::micro @ bci:-1 (line 
> 9)
>   25  ConL  === 0  [[ 376 ]]  #long:15
>  368  Return  === 5 6 7 8 9 returns 398  [[ 0 ]]
>  376  VectorLongToMask  === _ 25  [[ 397 ]]  #vectormask<F,16> !jvms: 
> VectorMask::fromLong @ bci:39 (line 243) testmcast::micro @ bci:6 (line 9)
>  397  VectorMaskCast  === _ 376  [[ 398 ]]  #vectormask<I,16> !jvms: 
> Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ 
> bci:11 (line 10)
>  398  VectorMaskToLong  === _ 397  [[ 368 ]]  #long !jvms: 
> Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ 
> bci:11 (line 10)
> [time] 17ms [res] 300000000
> TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=2 -Xbatch 
> -XX:-TieredCompilation 
> -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN
> G -cp . testmcast
> CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 
> 'BEFORE_MATCHING'
> AFTER: BEFORE_MATCHING
>    0  Root  === 0 368  [[ 0 1 3 25 ]] inner
>    3  Start  === 3 0  [[ 3 5 6 7 8 9 ]]  #{0:control, 1:abIO, 2:memory, 
> 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half}
>    5  Parm  === 3  [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9)
>    6  Parm  === 3  [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9)
>    7  Parm  === 3  [[ 368 ]] Memory  ...

This is caused by the different IRs when using AVX2 and AVX3.

- With AVX3 the generated IRs are:
  `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`

-  With AVX2 the generated IRs are:
  `(VectorMaskToLong (VectorStoreMask (VectorMaskCast (VectorLoadMask 
VectorLongToMask x)))))`

We have supported the following optimizations:
- `(VectorStoreMask (VectorMaskCast (VectorLoadMask x))) => (x)` and
- `(VectorMaskToLong (VectorLongToMask x)) => (x)`.

So with AVX2,
`(VectorMaskToLong (VectorStoreMask (VectorMaskCast (VectorLoadMask 
VectorLongToMask x))))) => (x)`

`(VectorMaskToLong (VectorMaskCast (VectorLongToMask x))) => (x)` is a 
potential optimization, I have mentioned this in the commit message. But now we 
have not supported it yet.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618369569

Reply via email to