Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v35]

2024-05-24 Thread Sandhya Viswanathan
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v40]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v25]

2024-05-24 Thread Sandhya Viswanathan
On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v27]

2024-05-24 Thread Sandhya Viswanathan
On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v20]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v41]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v35]

2024-05-24 Thread Sandhya Viswanathan
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: >> >>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, >>> mask, h

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: >> >>> 276: __ bind(L_nextCheck); >>> 277: __ testq(haystack_len_p, haystack_len_p); >>> 278: __ je(L_zeroCheckFailed); >> >> This check could be removed as the next

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: >> >>> 1331: >>> 1332: __ cmpq(nMinusK, 32); >>> 1333: __ jae_b(L_greaterThan32); >> >> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 >> __ cmpq

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v47]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8338023: Support two vector selectFrom API [v2]

2024-08-19 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:36:15 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >>

Re: RFR: 8338023: Support two vector selectFrom API [v3]

2024-08-21 Thread Sandhya Viswanathan
On Wed, 21 Aug 2024 16:49:40 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Pass explicit wrap argument to selectFrom API with default value set to >> true. > > Hi @rose00 , @sviswa7 , @PaulSa

Re: RFR: 8338023: Support two vector selectFrom API [v3]

2024-08-21 Thread Sandhya Viswanathan
On Wed, 21 Aug 2024 18:27:09 GMT, Paul Sandoz wrote: > Is it possible for the intrinsic to be responsible for wrapping, if needed? > If was looking at > [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=A

Re: RFR: 8338023: Support two vector selectFrom API [v3]

2024-08-21 Thread Sandhya Viswanathan
On Wed, 21 Aug 2024 16:42:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >>

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

2024-08-23 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD: Saturating signed addition. >>

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

2024-08-26 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD: Saturating signed addition. >>

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

2024-08-27 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD: Saturating signed addition. >>

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

2024-08-27 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD: Saturating signed addition. >>

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v2]

2024-08-27 Thread Sandhya Viswanathan
On Thu, 15 Aug 2024 06:59:53 GMT, Jatin Bhateja wrote: >>> its usage in existing patch is limited to [type >>> comparison.](https://github.com/openjdk/jdk/pull/20507/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1542) >> >> Ah, that makes sense to me. I took a clos

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v2]

2024-08-28 Thread Sandhya Viswanathan
On Wed, 28 Aug 2024 00:12:26 GMT, Sandhya Viswanathan wrote: >> Hey @jaskarth , Central idea behind introducing VectorReinterpretNode after >> unsigned vector IR is to facilitate unboxing-boxing optimization, this >> explicit reinterpretation ensures type compatibility b

Re: RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

2024-08-29 Thread Sandhya Viswanathan
On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD: Saturating signed addition. >>

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-29 Thread Sandhya Viswanathan
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote: >> @vnkozlov I have addressed all review comments. Could you please run the >> patch through your testing? Thanks a lot for all the help. > > @smita-kamath I have builds failures. Please, build and test yourself to > verify changes. > >

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-29 Thread Sandhya Viswanathan
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote: >> @vnkozlov I have addressed all review comments. Could you please run the >> patch through your testing? Thanks a lot for all the help. > > @smita-kamath I have builds failures. Please, build and test yourself to > verify changes. > >

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-21 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Sun, 13 Nov 2022 20:57:44 GMT, Claes Redestad wrote: >> src/hotspot/cpu/x86/x86_64.ad line 12073: >> >>> 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2, >>> rRegI tmp3, rFlagsReg cr) >>> 12072: %{ >>> 12073: predicate(UseAVX >= 2 && ((VectorizedHashCodeNode*)n)->

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 21:11:18 GMT, Claes Redestad wrote: >>> How far off is this ...? >> >> Back then it looked way too constrained (tight constraints on code shapes). >> But I considered it as a generally applicable optimization. >> >>> ... do you think it'll be able to match the efficiency

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v14]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 21:11:40 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 19:52:34 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/StringUTF16.java line 418: >> >>> 416: return 0; >>> 417: } else { >>> 418: return ArraysSupport.vectorizedHashCode(value, >>> ArraysSupport.UTF16); >> >> Special ca

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v16]

2022-12-21 Thread Sandhya Viswanathan
On Wed, 21 Dec 2022 17:29:23 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2023-01-06 Thread Sandhya Viswanathan
On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much >> cleaner now. The attached patch should handle the signed bytes/shorts as >> well. Please take a look. >> [signed.patch](https://github.com/openjdk/jdk/files/10

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18]

2023-01-09 Thread Sandhya Viswanathan
On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Explicitly lea external address > > Explicitly loading the address to a register seems to do the trick, avoiding >

Re: RFR: JDK-8301092 - Add benchmark for CRC32

2023-01-25 Thread Sandhya Viswanathan
On Wed, 25 Jan 2023 15:03:05 GMT, Scott Gibbons wrote: > Adding a performance benchmark test for CRC32. This does exactly the same > test as for CRC32C. test/micro/org/openjdk/bench/java/util/TestCRC32.java line 2: > 1: /* > 2: * Copyright (c) 2021, 2022, 2023, Oracle and/or its affiliates.

Re: RFR: JDK-8301092 - Add benchmark for CRC32 [v3]

2023-01-25 Thread Sandhya Viswanathan
On Wed, 25 Jan 2023 23:07:49 GMT, Scott Gibbons wrote: >> Adding a performance benchmark test for CRC32. This does exactly the same >> test as for CRC32C. > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fix copyright Ma

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-06 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >>

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-06 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >>

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-07 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 02:49:44 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add algorithm comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-13 Thread Sandhya Viswanathan
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >>

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 15:03:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658: >> >>> 2656: // Check for buffer too small (for algorithm) >>> 2657: __ subl(length, 0x2c); >>> 2658: __ jcc(Assembler::lessEqual, L_tailProc); >> >> This could be Assem

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 15:19:34 GMT, Claes Redestad wrote: >> Why? There is no performance difference and the intent is clear. Is this >> just a "style" thing? > > I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte > chunk in inputs that are divisible by 32 (starting from

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 22:41:47 GMT, Claes Redestad wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Last of review comments > > I've started tier1-5 testing internally. Will let you know if we find any > issues. Th

RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-21 Thread Sandhya Viswanathan
Change the java/lang/float.java and the corresponding shared runtime constant expression evaluation to generate QNaN. The HW instructions generate QNaNs and not SNaNs for floating point instructions. This happens across double, float, and float16 data types. The most significant bit of mantissa

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 04:03:02 GMT, David Holmes wrote: >> Change the java/lang/float.java and the corresponding shared runtime >> constant expression evaluation to generate QNaN. >> The HW instructions generate QNaNs and not SNaNs for floating point >> instructions. This happens across double, f

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 21:21:42 GMT, Vladimir Kozlov wrote: >>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have >>> three mechanisms for implementing this functionality: >>> >>> 1. The interpreted Java code >>> >>> 2. The compiled non-intrinisc sharedRuntime co

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-23 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan wrote: > Change the java/lang/float.java and the corresponding shared runtime constant > expression evaluation to generate QNaN. > The HW instructions generate QNaNs and not SNaNs for floating point > instructions. This ha

Withdrawn: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-23 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan wrote: > Change the java/lang/float.java and the corresponding shared runtime constant > expression evaluation to generate QNaN. > The HW instructions generate QNaNs and not SNaNs for floating point > instructions. This ha

Re: RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark

2023-02-28 Thread Sandhya Viswanathan
On Tue, 28 Feb 2023 15:59:26 GMT, Eirik Bjorsnos wrote: > This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of > benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves > as an example of how vectorization can be useful also in the area of text >

Re: RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark [v3]

2023-02-28 Thread Sandhya Viswanathan
On Tue, 28 Feb 2023 23:08:29 GMT, Eirik Bjorsnos wrote: >> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set >> of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark >> serves as an example of how vectorization can be useful also in the area of >> t

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Mon, 6 Mar 2023 23:54:44 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >>

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote: > Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in > Interpreter and C1 compiler to produce the same results as C2 intrinsics on > x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java > met

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote: > Note, I removed `ConvF2HFNode::Identity()` optimization because tests show > that it produces different NaN results due to skipped conversion. Yes, removing the Identity optimization is correct. It doesn't hold for NaN inputs. ---

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote: > Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in > Interpreter and C1 compiler to produce the same results as C2 intrinsics on > x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java > met

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931: >> >>> 3929: // For results consistency both intrinsics should be enabled. >>> 3930: if >>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && >>> 3931

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v2]

2023-03-07 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >>

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats

2022-08-08 Thread Sandhya Viswanathan
On Fri, 5 Aug 2022 16:36:23 GMT, Smita Kamath wrote: > 8289552: Make intrinsic conversions between bit representations of half > precision values and floats src/hotspot/cpu/x86/assembler_x86.cpp line 1927: > 1925: assert(VM_Version::supports_evex(), ""); > 1926: InstructionAttr attributes(

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats

2022-08-08 Thread Sandhya Viswanathan
On Fri, 5 Aug 2022 23:58:49 GMT, Joe Darcy wrote: >> @jddarcy Thanks for your comment. I am not sure if there is a way of using >> Java library implementation here. > > I was under the impression that if a platform didn't have special support for > the functionality in question it could not hav

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v5]

2022-08-24 Thread Sandhya Viswanathan
On Wed, 24 Aug 2022 23:48:36 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Updated copyri

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v7]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 23:22:46 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Added missing p

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v6]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 18:31:07 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed revie

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v5]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 18:26:52 GMT, Smita Kamath wrote: >> src/hotspot/cpu/x86/x86_64.ad line 11330: >> >>> 11328: ins_pipe( pipe_slow ); >>> 11329: %} >>> 11330: >> >> For HF2F, good to also add optimized rule with LoadS to benefit from >> vcvtph2ps memory src form of instruction. >> match(Se

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-01 Thread Sandhya Viswanathan
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed revie

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-20 Thread Sandhya Viswanathan
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed revie

Re: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation

2023-06-26 Thread Sandhya Viswanathan
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with > [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw > significant performance degradation in VectorAPI JMH micros and some of our > internal benchmarks. Following two issues

Re: RFR: 8311178: JMH tests don't scale well when sharing output buffers

2023-07-10 Thread Sandhya Viswanathan
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and > leads to poor scaling when run on multiple threads. The patch sets the scope > from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29]

2023-08-25 Thread Sandhya Viswanathan
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > After I fixed it Tier1 passed and I submitted other tiers. @v

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-28 Thread Sandhya Viswanathan
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-29 Thread Sandhya Viswanathan
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > The changes to DualPivo

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-08-31 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks scalabili

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-09-05 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks scalabili

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Sandhya Viswanathan
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-09-25 Thread Sandhya Viswanathan
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > Good. Thank you. @v

Re: RFR: 8314544: Matrix multiple benchmark using Vector API

2023-10-03 Thread Sandhya Viswanathan
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote: >> Added a bunch of different implementations for Vector API Matrix >> Multiplications: >> >> - Baseline >> - Blocked (Cache Local) >> - FMA >> - Vector API Simple Implementation >> - Vector API Blocked Implementation >> >> Commit was d

Re: RFR: 8314544: Matrix multiply benchmark using Vector API [v2]

2023-10-06 Thread Sandhya Viswanathan
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote: >> Martin Stypinski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - changed for consistency >> - improved some RandomGenerator & unuseed Imports > > fixed typo. @Styp Thanks, t

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the AVX

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix whitespace in build script > > Also @forceinline in these changes only works for case when new intrinsi

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote: > > Forgive me, I might be missing something very obvious, but is there any > > particular reason to entirely disable the SIMD accelerated sort on Zen 4 > > rather than having an alternate code path for Zen 4 where it has the > > `compresss

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan wrote: >> Also @forceinline in these changes only works for case when new intrinsics >> are not used. >> I would suggest to adapt/update JMH benchmark to cover all cases and see >> effect @forceinline without intri

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote: >> Hi Erik (@erikj79), >> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand >> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines >> with spaces to align features into columns and instead just

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157: >> >>> 155: @ForceInline >>> 156: private static void sort(Class elemType, A array, long >>> offset, int low, int high, SortOperation so) { >>> 157:

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote: > Proposed patch has one disadvantage: there's no way to override ergonomics > decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding > the JVM. > > For many other intrinsics there are flags which enable finer grained

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-13 Thread Sandhya Viswanathan
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote: >> @himichael Please refer to [this >> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) >> for how to correctly benchmark Java code. > >> @himichael Please refer to [this >> question](https

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-02 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606: >> >>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt, >>> XMMRegister dst, Register base, Register idx_base, >>> 1605:

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Restricting masked sub-word gather to AVX512 target to align with integral >> g

RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-09 Thread Sandhya Viswanathan
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked as flagless through @requires vm.flagless per [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). - Commit messages: - Mark LoadJsvmlTest.java test as flagless Changes: https://git.openjdk.org/jd

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy

2023-11-14 Thread Sandhya Viswanathan
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote: >> Below is baseline data collected using a modified version of the >> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug >> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake >> i7-1185G7, which

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). @lmesnik Could you please re

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote: >> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus >> marked as flagless through @requires vm.flagless per >> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). > > Marked as reviewed by lmesnik (Reviewer).

Integrated: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). This pull request has now been i

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-15 Thread Sandhya Viswanathan
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan wrote: >> match_rule_supported_vector called in the beginning will enforce these >> checks. > > This method is match_rule_support_vector and it is not enforcing this check > now. It was doing so before through fall thr

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]

2023-11-15 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v5]

2023-11-20 Thread Sandhya Viswanathan
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote: >> Update: the XorTest::xor results shown in this message used test code from >> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit >> a788f066af17. The XorTest has since been updated and XorTest::copy is no >> longe

  1   2   3   >