Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Jasmine Karthikeyan
On Sat, 19 Oct 2024 09:25:12 GMT, Jatin Bhateja wrote: >> IMO until C2 type system starts to track bitwise constant information >> ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there >> are not enough benefits to rely on IGVN here. So far, all the discussed >> patterns ar

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Jasmine Karthikeyan
On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together > into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Jasmine Karthikeyan
On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ > instruction for following IR pallets. > > >MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x) >MulVL (URShiftVL SRC1 , 32) (URShif

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-15 Thread Jasmine Karthikeyan
On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift S

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-14 Thread Jasmine Karthikeyan
On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift S

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-14 Thread Jasmine Karthikeyan
On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift S

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-11 Thread Jasmine Karthikeyan
On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together > into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-11 Thread Jasmine Karthikeyan
On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift S

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v2]

2024-08-14 Thread Jasmine Karthikeyan
On Mon, 12 Aug 2024 06:29:03 GMT, Jatin Bhateja wrote: > its usage in existing patch is limited to [type > comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542) Ah, that makes sense to me. I took a closer look an

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v2]

2024-08-08 Thread Jasmine Karthikeyan
On Thu, 8 Aug 2024 17:20:06 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SATURATING_UADD : Saturating unsigned addition. >> . SATURATING_ADD: Saturating sig

RFR: 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long

2024-07-21 Thread Jasmine Karthikeyan
Hi all, This patch fixes `cmovL_imm_01*` instructions matching against an integer immediate 1 instead of a long immediate 1. I noticed while looking at the backend implementation of CMove that the rules specify `immI_1` instead of `immL1`, which means that the instructions can't be matched and i

Re: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

2024-07-17 Thread Jasmine Karthikeyan
On Wed, 17 Jul 2024 09:18:31 GMT, Galder ZamarreƱo wrote: > Do you want a microbenchmark for the performance of vectorized max/min long? Yeah, I think a simple benchmark that tests for long min/max vectorization and reduction would be good. I worry that checking performance manually like in `R

Re: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

2024-07-10 Thread Jasmine Karthikeyan
On Tue, 9 Jul 2024 12:07:37 GMT, Galder ZamarreƱo wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in > order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these > calls because of the fo