On Thu, 25 Jan 2024 09:15:26 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The incremental webrev excludes the unrelated changes
>> brought in by the merge/rebase. The pull request contains 10 additional
>> commits
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 23 Jan 2024 15:20:47 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The incremental webrev excludes the unrelated changes
>> brought in by the merge/rebase. The pull request contains 10 additional
>> commits
On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 23 Jan 2024 08:17:13 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolution
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5301:
>
>> 5299: vmovmskps(rtm
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Fri, 19 Jan 2024 07:43:18 GMT, Emanuel Peter wrote:
>> For long/double each permute row is 32 byte in size, so a shift by 5 to
>> compute row address.
>
> Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.
> Because "64bit row" sounds like the whole row is only 64 bit long. It is
>
On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote:
>> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit?
>
> For long/double each permute row is 32 byte in size, so a shift by 5 to
> compute row address.
Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`.
Because
On Tue, 16 Jan 2024 07:08:57 GMT, Emanuel Peter wrote:
>> Each long/double permute lane holds 64 bit value.
>
> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit?
For long/double each permute row is 32 byte in size, so a shift by 5 to compute
row address.
-
PR
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309:
>>
>>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5308: vmovmskpd(rtmp, mask, vec_enc);
>>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs)
>>
>> Suggestio
On Mon, 15 Jan 2024 09:10:38 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Using emulated variable blend E-Core optimized instruction.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 53
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja wrote:
>> Yes, IF it is vectorized, then there is no difference between high and low
>> density. My concern was more if vectorization is preferrable over the scalar
>> alternative in the low-density case, where branch prediction is more stable.
>
>
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote:
>>> Thanks for the updates!
>>>
>>> One more idea: Your AVX2 solution has a lot of cost for converting the mask
>>> to a permutation. Might it make sense to split this off into a separate
>>> vector-node, so that it can float out of a loop i
On Mon, 8 Jan 2024 07:55:00 GMT, Emanuel Peter wrote:
>>> You are using `VectorMask pred = VectorMask.fromLong(ispecies,
>>> maskctr++);`. That basically systematically iterates over all masks, which
>>> is nice for a correctness test. But that would use different density inside
>>> one test r
On Fri, 5 Jan 2024 09:35:34 GMT, Emanuel Peter wrote:
>> Thanks for the comment addition!
>
> Improvement suggestion:
> For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask.
> The table has a row for each `mask` value, consisting of 8 ints, which
> provide the valid permu
On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja wrote:
>> You are using `VectorMask pred = VectorMask.fromLong(ispecies,
>> maskctr++);`.
>> That basically systematically iterates over all masks, which is nice for a
>> correctness test.
>> But that would use different density inside one test run
On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote:
>> Thanks for the updates!
>>
>> One more idea: Your AVX2 solution has a lot of cost for converting the mask
>> to a permutation. Might it make sense to split this off into a separate
>> vector-node, so that it can float out of a loop if th
On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote:
> Thanks for the updates!
>
> One more idea: Your AVX2 solution has a lot of cost for converting the mask
> to a permutation. Might it make sense to split this off into a separate
> vector-node, so that it can float out of a loop if the mas
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Fri, 5 Jan 2024 09:45:11 GMT, Emanuel Peter wrote:
> You are using `VectorMask pred = VectorMask.fromLong(ispecies,
> maskctr++);`. That basically systematically iterates over all masks, which is
> nice for a correctness test. But that would use different density inside one
> test run, righ
On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Fri, 5 Jan 2024 07:05:51 GMT, Jatin Bhateja wrote:
>> We do have extensive functional tests for compress/expand APIs in
>> [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector)
>
>> Could there be equivalent `expand` tests?
>
> Here are the
On Fri, 5 Jan 2024 09:37:55 GMT, Emanuel Peter wrote:
>> This computes the byte offset from start of the table, both integer and long
>> permute table have same row sizes, 8 int elements vs 4 long elements.
>
> Ah, I understand now. Maybe leave a comment for that?
I would say something like thi
On Thu, 4 Jan 2024 13:40:19 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957:
>
>> 955: __
On Fri, 5 Jan 2024 09:31:50 GMT, Emanuel Peter wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957:
>>
>>> 955: __ align(CodeEntryAlignment);
>>> 956: StubCodeMark mark(this, "StubRoutines", stub_name);
>>> 957: address start = __ pc();
>>
>> Could you please add some comments
On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307:
>>
>>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5306: vmovmskpd(rtmp, mask, vec_enc);
>>> 5307: shlq(rtmp, 5);
>>
>> Might this need to be 6? If I understan
On Thu, 4 Jan 2024 13:41:40 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307:
>
>> 5305:
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a
On Thu, 4 Jan 2024 13:30:24 GMT, Emanuel Peter wrote:
>> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java
>> line 94:
>>
>>> 92:IntVector vec = IntVector.fromArray(ispecies, intinCol, i);
>>> 93:VectorMask pred = vec.compare(VectorOperators.GT
On Fri, 5 Jan 2024 07:03:26 GMT, Jatin Bhateja wrote:
>> And what about some result verification? Or is there another test that does
>> that?
>
> We do have extensive functional tests for compress/expand APIs in
> [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/j
On Thu, 4 Jan 2024 13:33:08 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark
On Thu, 4 Jan 2024 05:39:01 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
On Thu, 4 Jan 2024 13:09:30 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Updating copyright year of modified files.
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used operation in columnar database filter
> operation.
>
> Implementation
Hi,
Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
targets.
Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
instruction set.
These are very frequently used operation in columnar database filter operation.
Implementation uses a lookup table
49 matches
Mail list logo