On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578:
>>
>>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value.
>>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop,
>>> mask, h
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278:
>>
>>> 276: __ bind(L_nextCheck);
>>> 277: __ testq(haystack_len_p, haystack_len_p);
>>> 278: __ je(L_zeroCheckFailed);
>>
>> This check could be removed as the next
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333:
>>
>>> 1331:
>>> 1332: __ cmpq(nMinusK, 32);
>>> 1333: __ jae_b(L_greaterThan32);
>>
>> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31
>> __ cmpq
On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Mon, 19 Aug 2024 07:36:15 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Wed, 21 Aug 2024 16:49:40 GMT, Jatin Bhateja wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Pass explicit wrap argument to selectFrom API with default value set to
>> true.
>
> Hi @rose00 , @sviswa7 , @PaulSa
On Wed, 21 Aug 2024 18:27:09 GMT, Paul Sandoz wrote:
> Is it possible for the intrinsic to be responsible for wrapping, if needed?
> If was looking at
> [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=A
On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector.selectFrom(Vector v1, Vector v2)
>>
>>
>> Semantics:-
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Thu, 15 Aug 2024 06:59:53 GMT, Jatin Bhateja wrote:
>>> its usage in existing patch is limited to [type
>>> comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542)
>>
>> Ah, that makes sense to me. I took a clos
On Wed, 28 Aug 2024 00:12:26 GMT, Sandhya Viswanathan
wrote:
>> Hey @jaskarth , Central idea behind introducing VectorReinterpretNode after
>> unsigned vector IR is to facilitate unboxing-boxing optimization, this
>> explicit reinterpretation ensures type compatibility b
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support
>> following new vector operators.
>>
>>
>> . SUADD : Saturating unsigned addition.
>> . SADD: Saturating signed addition.
>>
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote:
>> @vnkozlov I have addressed all review comments. Could you please run the
>> patch through your testing? Thanks a lot for all the help.
>
> @smita-kamath I have builds failures. Please, build and test yourself to
> verify changes.
>
>
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote:
>> @vnkozlov I have addressed all review comments. Could you please run the
>> patch through your testing? Thanks a lot for all the help.
>
> @smita-kamath I have builds failures. Please, build and test yourself to
> verify changes.
>
>
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Sun, 13 Nov 2022 20:57:44 GMT, Claes Redestad wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 12073:
>>
>>> 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2,
>>> rRegI tmp3, rFlagsReg cr)
>>> 12072: %{
>>> 12073: predicate(UseAVX >= 2 && ((VectorizedHashCodeNode*)n)->
On Tue, 20 Dec 2022 21:11:18 GMT, Claes Redestad wrote:
>>> How far off is this ...?
>>
>> Back then it looked way too constrained (tight constraints on code shapes).
>> But I considered it as a generally applicable optimization.
>>
>>> ... do you think it'll be able to match the efficiency
On Tue, 20 Dec 2022 21:11:40 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Tue, 20 Dec 2022 19:52:34 GMT, Claes Redestad wrote:
>> src/java.base/share/classes/java/lang/StringUTF16.java line 418:
>>
>>> 416: return 0;
>>> 417: } else {
>>> 418: return ArraysSupport.vectorizedHashCode(value,
>>> ArraysSupport.UTF16);
>>
>> Special ca
On Wed, 21 Dec 2022 17:29:23 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote:
>> @cl4es Thanks for passing the constant node through, the code looks much
>> cleaner now. The attached patch should handle the signed bytes/shorts as
>> well. Please take a look.
>> [signed.patch](https://github.com/openjdk/jdk/files/10
On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote:
>> Claes Redestad has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Explicitly lea external address
>
> Explicitly loading the address to a register seems to do the trick, avoiding
>
On Wed, 25 Jan 2023 15:03:05 GMT, Scott Gibbons wrote:
> Adding a performance benchmark test for CRC32. This does exactly the same
> test as for CRC32C.
test/micro/org/openjdk/bench/java/util/TestCRC32.java line 2:
> 1: /*
> 2: * Copyright (c) 2021, 2022, 2023, Oracle and/or its affiliates.
On Wed, 25 Jan 2023 23:07:49 GMT, Scott Gibbons wrote:
>> Adding a performance benchmark test for CRC32. This does exactly the same
>> test as for CRC32C.
>
> Scott Gibbons has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Fix copyright
Ma
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
>>
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
>>
On Tue, 7 Feb 2023 02:49:44 GMT, Sandhya Viswanathan
wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Add algorithm comments
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
>>
On Tue, 14 Feb 2023 15:03:49 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658:
>>
>>> 2656: // Check for buffer too small (for algorithm)
>>> 2657: __ subl(length, 0x2c);
>>> 2658: __ jcc(Assembler::lessEqual, L_tailProc);
>>
>> This could be Assem
On Tue, 14 Feb 2023 15:19:34 GMT, Claes Redestad wrote:
>> Why? There is no performance difference and the intent is clear. Is this
>> just a "style" thing?
>
> I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte
> chunk in inputs that are divisible by 32 (starting from
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
>
On Tue, 14 Feb 2023 22:41:47 GMT, Claes Redestad wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Last of review comments
>
> I've started tier1-5 testing internally. Will let you know if we find any
> issues.
Th
Change the java/lang/float.java and the corresponding shared runtime constant
expression evaluation to generate QNaN.
The HW instructions generate QNaNs and not SNaNs for floating point
instructions. This happens across double, float, and float16 data types. The
most significant bit of mantissa
On Wed, 22 Feb 2023 04:03:02 GMT, David Holmes wrote:
>> Change the java/lang/float.java and the corresponding shared runtime
>> constant expression evaluation to generate QNaN.
>> The HW instructions generate QNaNs and not SNaNs for floating point
>> instructions. This happens across double, f
On Wed, 22 Feb 2023 21:21:42 GMT, Vladimir Kozlov wrote:
>>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have
>>> three mechanisms for implementing this functionality:
>>>
>>> 1. The interpreted Java code
>>>
>>> 2. The compiled non-intrinisc sharedRuntime co
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan
wrote:
> Change the java/lang/float.java and the corresponding shared runtime constant
> expression evaluation to generate QNaN.
> The HW instructions generate QNaNs and not SNaNs for floating point
> instructions. This ha
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan
wrote:
> Change the java/lang/float.java and the corresponding shared runtime constant
> expression evaluation to generate QNaN.
> The HW instructions generate QNaNs and not SNaNs for floating point
> instructions. This ha
On Tue, 28 Feb 2023 15:59:26 GMT, Eirik Bjorsnos wrote:
> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of
> benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves
> as an example of how vectorization can be useful also in the area of text
>
On Tue, 28 Feb 2023 23:08:29 GMT, Eirik Bjorsnos wrote:
>> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set
>> of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark
>> serves as an example of how vectorization can be useful also in the area of
>> t
On Mon, 6 Mar 2023 23:54:44 GMT, Vladimir Kozlov wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>>
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote:
> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
> met
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote:
> Note, I removed `ConvF2HFNode::Identity()` optimization because tests show
> that it produces different NaN results due to skipped conversion.
Yes, removing the Identity optimization is correct. It doesn't hold for NaN
inputs.
---
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote:
> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
> met
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
>>
>>> 3929: // For results consistency both intrinsics should be enabled.
>>> 3930: if
>>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>>> 3931
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>>
On Fri, 5 Aug 2022 16:36:23 GMT, Smita Kamath wrote:
> 8289552: Make intrinsic conversions between bit representations of half
> precision values and floats
src/hotspot/cpu/x86/assembler_x86.cpp line 1927:
> 1925: assert(VM_Version::supports_evex(), "");
> 1926: InstructionAttr attributes(
On Fri, 5 Aug 2022 23:58:49 GMT, Joe Darcy wrote:
>> @jddarcy Thanks for your comment. I am not sure if there is a way of using
>> Java library implementation here.
>
> I was under the impression that if a platform didn't have special support for
> the functionality in question it could not hav
On Wed, 24 Aug 2022 23:48:36 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Updated copyri
On Thu, 1 Sep 2022 23:22:46 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Added missing p
On Thu, 1 Sep 2022 18:31:07 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed revie
On Thu, 1 Sep 2022 18:26:52 GMT, Smita Kamath wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 11330:
>>
>>> 11328: ins_pipe( pipe_slow );
>>> 11329: %}
>>> 11330:
>>
>> For HF2F, good to also add optimized rule with LoadS to benefit from
>> vcvtph2ps memory src form of instruction.
>> match(Se
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed revie
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed revie
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote:
> Backing out shuffle related overhaul done with
> [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw
> significant performance degradation in VectorAPI JMH micros and some of our
> internal benchmarks. Following two issues
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote:
> The below benchmark files have scaling issues due to cache contention and
> leads to poor scaling when run on multiple threads. The patch sets the scope
> from benchmark level to thread level to fix the issue:
> - org/openjdk/bench/java/io/
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Remove unnecessary import in Arrays.java
>
> After I fixed it Tier1 passed and I submitted other tiers.
@v
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> The changes to DualPivo
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks scalabili
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks scalabili
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> Good. Thank you.
@v
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote:
>> Added a bunch of different implementations for Vector API Matrix
>> Multiplications:
>>
>> - Baseline
>> - Blocked (Cache Local)
>> - FMA
>> - Vector API Simple Implementation
>> - Vector API Blocked Implementation
>>
>> Commit was d
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote:
>> Martin Stypinski has updated the pull request incrementally with two
>> additional commits since the last revision:
>>
>> - changed for consistency
>> - improved some RandomGenerator & unuseed Imports
>
> fixed typo.
@Styp Thanks, t
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal of this PR is to address the follow-up comments to the SIMD
>> accelerated sort PR (#14227) which implemented AVX512 intrinsics for
>> Arrays.sort() methods.
>> The proposed changes are:
>>
>> 1) Restriction of the AVX
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> fix whitespace in build script
>
> Also @forceinline in these changes only works for case when new intrinsi
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote:
> > Forgive me, I might be missing something very obvious, but is there any
> > particular reason to entirely disable the SIMD accelerated sort on Zen 4
> > rather than having an alternate code path for Zen 4 where it has the
> > `compresss
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan
wrote:
>> Also @forceinline in these changes only works for case when new intrinsics
>> are not used.
>> I would suggest to adapt/update JMH benchmark to cover all cases and see
>> effect @forceinline without intri
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote:
>> Hi Erik (@erikj79),
>> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand
>> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines
>> with spaces to align features into columns and instead just
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote:
>> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157:
>>
>>> 155: @ForceInline
>>> 156: private static void sort(Class elemType, A array, long
>>> offset, int low, int high, SortOperation so) {
>>> 157:
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote:
> Proposed patch has one disadvantage: there's no way to override ergonomics
> decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding
> the JVM.
>
> For many other intrinsics there are flags which enable finer grained
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote:
>> @himichael Please refer to [this
>> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java)
>> for how to correctly benchmark Java code.
>
>> @himichael Please refer to [this
>> question](https
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606:
>>
>>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt,
>>> XMMRegister dst, Register base, Register idx_base,
>>> 1605:
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Restricting masked sub-word gather to AVX512 target to align with integral
>> g
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
as flagless through @requires vm.flagless per
[JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
-
Commit messages:
- Mark LoadJsvmlTest.java test as flagless
Changes: https://git.openjdk.org/jd
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote:
>> Below is baseline data collected using a modified version of the
>> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug
>> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake
>> i7-1185G7, which
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
@lmesnik Could you please re
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote:
>> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus
>> marked as flagless through @requires vm.flagless per
>> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
>
> Marked as reviewed by lmesnik (Reviewer).
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
This pull request has now been i
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan
wrote:
>> match_rule_supported_vector called in the beginning will enforce these
>> checks.
>
> This method is match_rule_support_vector and it is not enforcing this check
> now. It was doing so before through fall thr
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote:
>> Update: the XorTest::xor results shown in this message used test code from
>> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit
>> a788f066af17. The XorTest has since been updated and XorTest::copy is no
>> longe
1 - 100 of 220 matches
Mail list logo