Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]

2024-01-25 Thread Jatin Bhateja
On Thu, 25 Jan 2024 09:15:26 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains 10 additional >> commits

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]

2024-01-25 Thread Emanuel Peter
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]

2024-01-24 Thread Jatin Bhateja
On Tue, 23 Jan 2024 15:20:47 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains 10 additional >> commits

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]

2024-01-23 Thread Emanuel Peter
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-23 Thread Jatin Bhateja
On Tue, 23 Jan 2024 08:17:13 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5301: > >> 5299: vmovmskps(rtm

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]

2024-01-23 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-23 Thread Emanuel Peter
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-22 Thread Sandhya Viswanathan
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-20 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-19 Thread Jatin Bhateja
On Fri, 19 Jan 2024 07:43:18 GMT, Emanuel Peter wrote: >> For long/double each permute row is 32 byte in size, so a shift by 5 to >> compute row address. > > Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`. > Because "64bit row" sounds like the whole row is only 64 bit long. It is >

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-18 Thread Emanuel Peter
On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote: >> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? > > For long/double each permute row is 32 byte in size, so a shift by 5 to > compute row address. Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`. Because

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-18 Thread Jatin Bhateja
On Tue, 16 Jan 2024 07:08:57 GMT, Emanuel Peter wrote: >> Each long/double permute lane holds 64 bit value. > > @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? For long/double each permute row is 32 byte in size, so a shift by 5 to compute row address. - PR

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v6]

2024-01-18 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Emanuel Peter
On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: >> >>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5308: vmovmskpd(rtmp, mask, vec_enc); >>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs) >> >> Suggestio

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Jatin Bhateja
On Mon, 15 Jan 2024 09:10:38 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Using emulated variable blend E-Core optimized instruction. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 53

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-15 Thread Emanuel Peter
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-14 Thread Andrey Turbanov
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-10 Thread Jatin Bhateja
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5]

2024-01-09 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-09 Thread Emanuel Peter
On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja wrote: >> Yes, IF it is vectorized, then there is no difference between high and low >> density. My concern was more if vectorization is preferrable over the scalar >> alternative in the low-density case, where branch prediction is more stable. > >

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4]

2024-01-09 Thread Emanuel Peter
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-08 Thread Jatin Bhateja
On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote: >>> Thanks for the updates! >>> >>> One more idea: Your AVX2 solution has a lot of cost for converting the mask >>> to a permutation. Might it make sense to split this off into a separate >>> vector-node, so that it can float out of a loop i

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-08 Thread Jatin Bhateja
On Mon, 8 Jan 2024 07:55:00 GMT, Emanuel Peter wrote: >>> You are using `VectorMask pred = VectorMask.fromLong(ispecies, >>> maskctr++);`. That basically systematically iterates over all masks, which >>> is nice for a correctness test. But that would use different density inside >>> one test r

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-08 Thread Emanuel Peter
On Fri, 5 Jan 2024 09:35:34 GMT, Emanuel Peter wrote: >> Thanks for the comment addition! > > Improvement suggestion: > For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask. > The table has a row for each `mask` value, consisting of 8 ints, which > provide the valid permu

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4]

2024-01-08 Thread Emanuel Peter
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-08 Thread Emanuel Peter
On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja wrote: >> You are using `VectorMask pred = VectorMask.fromLong(ispecies, >> maskctr++);`. >> That basically systematically iterates over all masks, which is nice for a >> correctness test. >> But that would use different density inside one test run

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-08 Thread Quan Anh Mai
On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote: >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask >> to a permutation. Might it make sense to split this off into a separate >> vector-node, so that it can float out of a loop if th

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-07 Thread Jatin Bhateja
On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote: > Thanks for the updates! > > One more idea: Your AVX2 solution has a lot of cost for converting the mask > to a permutation. Might it make sense to split this off into a separate > vector-node, so that it can float out of a loop if the mas

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4]

2024-01-07 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-07 Thread Jatin Bhateja
On Fri, 5 Jan 2024 09:45:11 GMT, Emanuel Peter wrote: > You are using `VectorMask pred = VectorMask.fromLong(ispecies, > maskctr++);`. That basically systematically iterates over all masks, which is > nice for a correctness test. But that would use different density inside one > test run, righ

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 07:05:51 GMT, Jatin Bhateja wrote: >> We do have extensive functional tests for compress/expand APIs in >> [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) > >> Could there be equivalent `expand` tests? > > Here are the

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 09:37:55 GMT, Emanuel Peter wrote: >> This computes the byte offset from start of the table, both integer and long >> permute table have same row sizes, 8 int elements vs 4 long elements. > > Ah, I understand now. Maybe leave a comment for that? I would say something like thi

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-05 Thread Emanuel Peter
On Thu, 4 Jan 2024 13:40:19 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: > >> 955: __

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 09:31:50 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: >> >>> 955: __ align(CodeEntryAlignment); >>> 956: StubCodeMark mark(this, "StubRoutines", stub_name); >>> 957: address start = __ pc(); >> >> Could you please add some comments

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: >> >>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5306: vmovmskpd(rtmp, mask, vec_enc); >>> 5307: shlq(rtmp, 5); >> >> Might this need to be 6? If I understan

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Jatin Bhateja
On Thu, 4 Jan 2024 13:41:40 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: > >> 5305:

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-04 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Jatin Bhateja
On Thu, 4 Jan 2024 13:30:24 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java >> line 94: >> >>> 92:IntVector vec = IntVector.fromArray(ispecies, intinCol, i); >>> 93:VectorMask pred = vec.compare(VectorOperators.GT

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Jatin Bhateja
On Fri, 5 Jan 2024 07:03:26 GMT, Jatin Bhateja wrote: >> And what about some result verification? Or is there another test that does >> that? > > We do have extensive functional tests for compress/expand APIs in > [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/j

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Jatin Bhateja
On Thu, 4 Jan 2024 13:33:08 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Emanuel Peter
On Thu, 4 Jan 2024 05:39:01 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-04 Thread Emanuel Peter
On Thu, 4 Jan 2024 13:09:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

2024-01-03 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used operation in columnar database filter > operation. > > Implementation

RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target.

2024-01-03 Thread Jatin Bhateja
Hi, Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. These are very frequently used operation in columnar database filter operation. Implementation uses a lookup table