On Wed, 25 Jun 2025 09:16:48 GMT, Xiaohong Gong wrote:
>> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs
>> for X86 platforms [1]. However, the current implementation is not optimal
>> for AArch64 SVE platform, which natively supports vector instructions for
>> sub
On Fri, 30 May 2025 19:34:16 GMT, Mohamed Issa wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are
>> included to check the performance of specific input value ranges to help
>> prevent regression
On Thu, 29 May 2025 18:56:11 GMT, Mohamed Issa wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are
>> included to check the performance of specific input value ranges to help
>> prevent regression
On Wed, 28 May 2025 18:36:38 GMT, Mohamed Issa wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are
>> included to check the performance of specific input value ranges to help
>> prevent regression
On Tue, 6 May 2025 21:45:34 GMT, Mohamed Issa wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are
>> included to check the performance of specific input value ranges to help
>> prevent regressions
On Tue, 25 Feb 2025 19:06:05 GMT, Vladimir Ivanov wrote:
> Array initialization by parameter was added. Extra constant was used to align
> cycle step with used arrays.
Looks good to me.
-
Marked as reviewed by sviswanathan (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/2
On Tue, 25 Feb 2025 19:06:05 GMT, Vladimir Ivanov wrote:
> Array initialization by parameter was added. Extra constant was used to align
> cycle step with used arrays.
test/micro/org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java
line 51:
> 49: @Setup(Level.Trial)
> 50:
On Tue, 18 Feb 2025 02:36:13 GMT, Julian Waters wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions
>
> Is anyone else getting compile failures after this was integrated? This
> weirdly se
On Tue, 11 Feb 2025 21:47:31 GMT, Volodymyr Paprotski
wrote:
>> (Also see `8319429: Resetting MXCSR flags degrades ecore`)
>>
>> This PR fixes two issues:
>> - the original issue is a crash caused by `__ warn` corrupting the stack on
>> Windows only
>> - This issue also uncovered that -Xcheck:
On Mon, 3 Feb 2025 21:43:56 GMT, Volodymyr Paprotski
wrote:
>> (Also see `8319429: Resetting MXCSR flags degrades ecore`)
>>
>> This PR fixes two issues:
>> - the original issue is a crash caused by `__ warn` corrupting the stack on
>> Windows only
>> - This issue also uncovered that -Xcheck:j
On Wed, 29 Jan 2025 06:26:41 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch adds C2 compiler support for various Float16 operations added by
>> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>>
>> Following is the summary of changes included with this patch:-
>>
>> 1. Detection
On Tue, 28 Jan 2025 06:26:11 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch adds C2 compiler support for various Float16 operations added by
>> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>>
>> Following is the summary of changes included with this patch:-
>>
>> 1. Detection
On Mon, 27 Jan 2025 08:35:44 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch adds C2 compiler support for various Float16 operations added by
>> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>>
>> Following is the summary of changes included with this patch:-
>>
>> 1. Detection
On Mon, 27 Jan 2025 08:35:44 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch adds C2 compiler support for various Float16 operations added by
>> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>>
>> Following is the summary of changes included with this patch:-
>>
>> 1. Detection
On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote:
> Hi All,
>
> This patch adds C2 compiler support for various Float16 operations added by
> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>
> Following is the summary of changes included with this patch:-
>
> 1. Detection of vario
On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote:
> Hi All,
>
> This patch adds C2 compiler support for various Float16 operations added by
> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>
> Following is the summary of changes included with this patch:-
>
> 1. Detection of vario
On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote:
>> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ
>> instruction for following IR pallets.
>>
>>
>>MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x)
>>MulVL (URShiftVL SRC1 , 32) (
On Tue, 19 Nov 2024 00:29:42 GMT, Sandhya Viswanathan
wrote:
>> Hi All,
>>
>> This patch adds C2 compiler support for various Float16 operations added by
>> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>>
>> Following is the summary of changes
On Tue, 19 Nov 2024 08:43:06 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/x86.ad line 11015:
>>
>>> 11013: ins_encode %{
>>> 11014: int vlen_enc = vector_length_encoding(this);
>>> 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister,
>>> $src1$$XMMRegister, vlen_enc);
>>
On Wed, 13 Nov 2024 02:43:12 GMT, Jatin Bhateja wrote:
>> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ
>> instruction for following IR pallets.
>>
>>
>>MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x)
>>MulVL (URShiftVL SRC1 , 32) (
On Sun, 10 Nov 2024 07:36:55 GMT, Jatin Bhateja wrote:
>> Yes, this should ensure 0x.
>
> We land here only after checking if inputs are uints, didn't want redundant
> match, its just a convince routine for forwarding inputs. I will create a
> lambda for this.
uint check only ensures v
On Fri, 8 Nov 2024 20:25:10 GMT, Vladimir Ivanov wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Creating specialized IR to shield pattern from subsequent transforms in
>> optimization pipeline
>
> src/hotspot/sh
On Sun, 6 Oct 2024 10:24:53 GMT, Quan Anh Mai wrote:
>> Quan Anh Mai has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains one commit:
>>
>> [vectorapi] Refactor VectorShuffle implementation
>
> I have adapted the patch in accordance
On Tue, 22 Oct 2024 15:56:18 GMT, Paul Sandoz wrote:
>> Hey @eme64 ,
>>
>>> Wow this is really a very moving target - quite frustrating to review - it
>>> takes up way too much of the reviewers bandwidth. You really need to split
>>> up your PRs as much as possible so that review is easier and
On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>
On Thu, 17 Oct 2024 15:41:58 GMT, Jatin Bhateja wrote:
> > Rather than adding more IR test functionality to this PR that requires
> > additional review my recommendation would be to follow up in another PR or
> > before hand rethink our approach.
>
> Agree, I am thinking of developing an autom
On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote:
>> I see the problem with float/double vectors. Let us do the rearrange form
>> only for Integral (byte, short, int, long) vectors then. For float/double
>> vector we could keep the code that you have currently.
>
> You will also need additi
On Thu, 3 Oct 2024 05:09:22 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote:
>> I see the problem with float/double vectors. Let us do the rearrange form
>> only for Integral (byte, short, int, long) vectors then. For float/double
>> vector we could keep the code that you have currently.
>
> You will also need additi
On Tue, 1 Oct 2024 05:09:25 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan
wrote:
> Currently the rearrange and selectFrom APIs check shuffle indices and throw
> IndexOutOfBoundsException if there is any exceptional source index in the
> shuffle. This causes the generated code to be less optimal. This PR
On Tue, 1 Oct 2024 09:51:27 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Tue, 1 Oct 2024 09:53:02 GMT, Jatin Bhateja wrote:
>> Thanks for the example. Yes, you have a point there. So we would need to do:
>>src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH -
>> 1).toShuffle(), src2);
>
>> This could instead be: src1.rearrange(this.lanewise(VectorOp
On Mon, 30 Sep 2024 22:51:57 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w
>>
On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Mon, 30 Sep 2024 21:28:22 GMT, Paul Sandoz wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java
>> line 551:
>>
>>> 549: return ((ByteVector)src1).vectorFactory(res);
>>> 550: }
>>> 551:
>>
>> This could instead be:
>>src1.rearrange(this.lan
On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.tanh() using libm
>>
>> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup
>> -- | -- | -- | --
>> MathBench.tanhDouble | 70900 | 95618 | 1.35x
On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote:
>> Sandhya Viswanathan has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Address review comments
>
> I'm a bit confused b
On Thu, 19 Sep 2024 07:29:11 GMT, Jatin Bhateja wrote:
>> Sandhya Viswanathan has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Change method name
>
> Hi @sviswa7 , some comments, overall patch looks good to
3,1)
> 0x7f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
> 0x7f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
> 0x7f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
> 0x7f40d022751f: add$0x40,%ebx
> 0x7f40d0227522: cmp%r8d,%ebx
> 0x7f40d022752
On Wed, 18 Sep 2024 17:18:42 GMT, Paul Sandoz wrote:
> Will this have any direct impact on the changes proposed by #20508 and #20634?
I think we should first get the 20508 and 20634 integrated before this one.
-
PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-235902
3,1)
> 0x7f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
> 0x7f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
> 0x7f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
> 0x7f40d022751f: add$0x40,%ebx
> 0x7f40d0227522: cmp%r8d,%ebx
> 0x7f40d022752
On Tue, 17 Sep 2024 22:29:01 GMT, Paul Sandoz wrote:
> > @PaulSandoz What do you think regarding x86-32?
>
> I don't see anything obvious in the changes of this PR that would affect
> x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some
> existing bug?@sviswa7 what do you t
On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote:
> I'm a bit confused by the name `shuffleWrapIndexes` and
> `inline_vector_shuffle_wrap_indexes`.
>
> Are you **shuffling wrap-indexes**? I don't know what that would even mean. I
> think you should name it `wrapShuffleIndexes`. Or is ther
On Tue, 17 Sep 2024 18:21:43 GMT, Paul Sandoz wrote:
> > Adding link to UTF-8 decoding use case for convenience and reminder:
> > https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java.
>
> Another related link to base 64 decoding
> https://github.com
On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote:
>>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we
>>> should stop normalizing indices when converting a `Vector` into a
>>> `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN,
>>> 0)`). Then
On Fri, 13 Sep 2024 22:30:25 GMT, Srinivas Vamsi Parasa
wrote:
>> So far, this will be the only intrinsic implementation of tanh. Therefore,
>> at the moment it is just checking the consistency of the intrinsic
>> implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp
>> ac
On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote:
>>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we
>>> should stop normalizing indices when converting a `Vector` into a
>>> `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN,
>>> 0)`). Then
3,1)
> 0x7f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
> 0x7f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
> 0x7f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
> 0x7f40d022751f: add$0x40,%ebx
> 0x7f40d0227522: cmp%r8d,%ebx
> 0x7f40d022752
On Fri, 13 Sep 2024 19:04:12 GMT, Jatin Bhateja wrote:
>> @jatin-bhateja If you could expand on this comment with specific cases it
>> will be helpful. The loadShuffle generation is needed for platform specific
>> handling of shuffles and cannot be optimized out here.
>
> Hi @sviswa7, I was sug
On Fri, 13 Sep 2024 05:30:36 GMT, Jatin Bhateja wrote:
>> Currently the rearrange and selectFrom APIs check shuffle indices and throw
>> IndexOutOfBoundsException if there is any exceptional source index in the
>> shuffle. This causes the generated code to be less optimal. This PR modifies
>>
On Fri, 13 Sep 2024 17:20:40 GMT, Quan Anh Mai wrote:
> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we
> should stop normalizing indices when converting a `Vector` into a
> `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN,
> 0)`). Then the re
On Thu, 22 Aug 2024 18:21:50 GMT, Paul Sandoz wrote:
> API shapes are good!
>
> I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes
> that are functionally equivalent to the Java expression
> `v.rearrange(this.toShuffle())`. That way we can better generate an optimal
Currently the rearrange and selectFrom APIs check shuffle indices and throw
IndexOutOfBoundsException if there is any exceptional source index in the
shuffle. This causes the generated code to be less optimal. This PR modifies
the rearrange/selectFrom Vector API methods to perform wrapIndexes in
On Wed, 11 Sep 2024 01:59:54 GMT, Joe Darcy wrote:
>>> If the test is going to use randomness, then its jtreg tags should include
>>>
>>> `@key randomness`
>>>
>>> and it is preferable to use jdk.test.lib.RandomFactory to get and Random
>>> object since that handles printing out a key so the r
On Thu, 5 Sep 2024 19:10:34 GMT, Srinivas Vamsi Parasa wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for
>> java.lang.Math.tanh() using libm
>>
>> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup
>> -- | -- | -- | --
>> MathBench.tanhDouble | 70900 | 95618 | 1.35x
>
On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Fri, 6 Sep 2024 21:15:07 GMT, Sandhya Viswanathan
wrote:
>> @vamsi-parasa don't hesitate in adding as much and explicit information
>> about the original source from where the algorithm has been picked up, even
>> though the PR explicitly mentions libm. Ad
On Wed, 4 Sep 2024 01:57:42 GMT, Jatin Bhateja wrote:
>> @theRealAph, this implementation is based on Intel libm math library and
>> meets the accuracy requirements. The algorithm is provided in the comments.
>
> @vamsi-parasa don't hesitate in adding as much and explicit information
> about t
On Fri, 6 Sep 2024 18:39:08 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review suggestions
>
> src/jdk.incubator.vector/share/classes/jdk/incubat
On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 2 Sep 2024 12:15:10 GMT, Jatin Bhateja wrote:
>> If the aim is to reduce the number of nodes, we could merge the
>> Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and
>> Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned
>> saturating add into Op_Saturati
On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 2 Sep 2024 12:17:08 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/x86.ad line 10656:
>>
>>> 10654: match(Set dst (SaturatingSubVI src1 src2));
>>> 10655: match(Set dst (SaturatingSubVL src1 src2));
>>> 10656: effect(TEMP ktmp);
>>
>> This needs TEMP dst as well.
>
> There is no
On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Wed, 28 Aug 2024 00:12:26 GMT, Sandhya Viswanathan
wrote:
>> Hey @jaskarth , Central idea behind introducing VectorReinterpretNode after
>> unsigned vector IR is to facilitate unboxing-boxing optimization, this
>> explicit reinterpretation ensures type compatibility b
On Thu, 15 Aug 2024 06:59:53 GMT, Jatin Bhateja wrote:
>>> its usage in existing patch is limited to [type
>>> comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542)
>>
>> Ah, that makes sense to me. I took a clos
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Wed, 21 Aug 2024 18:27:09 GMT, Paul Sandoz wrote:
> Is it possible for the intrinsic to be responsible for wrapping, if needed?
> If was looking at
> [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=A
On Wed, 21 Aug 2024 16:49:40 GMT, Jatin Bhateja wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Pass explicit wrap argument to selectFrom API with default value set to
>> true.
>
> Hi @rose00 , @sviswa7 , @PaulSa
On Mon, 19 Aug 2024 07:36:15 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333:
>>
>>> 1331:
>>> 1332: __ cmpq(nMinusK, 32);
>>> 1333: __ jae_b(L_greaterThan32);
>>
>> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31
>> __ cmpq
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278:
>>
>>> 276: __ bind(L_nextCheck);
>>> 277: __ testq(haystack_len_p, haystack_len_p);
>>> 278: __ je(L_zeroCheckFailed);
>>
>> This check could be removed as the next
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578:
>>
>>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value.
>>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop,
>>> mask, h
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 16 May 2024 17:08:21 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238:
>>
>>> 236: const Register needle = rdx;
>>> 237: const Register needle_len = rcx;
>>> 238:
>>
>> This is the calling convention on Linux. How is windows plat
On Thu, 16 May 2024 20:22:40 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510:
>>
>>> 1508: compare_big_haystack_to_needle(sizeKnown, size,
>>> NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength,
>>> 1509:
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
1 - 100 of 220 matches
Mail list logo