Re: RFR: 8359424: Eliminate table lookup in Integer/Long toHexString

2025-06-13 Thread Francesco Nigro
On Tue, 7 Jan 2025 10:39:18 GMT, Shaojin Wen wrote: > In PR #22928, UUID introduced long-based vectorized hexadecimal to string > conversion, which can also be used in Integer::toHexString and > Long::toHexString to eliminate table lookups. The benefit of eliminating > table lookups is that th

Re: RFR: 8343377: Performance regression in reflective invocation of native methods

2024-11-19 Thread Francesco Nigro
On Fri, 15 Nov 2024 22:17:10 GMT, Chen Liang wrote: > When core reflection was migrated to be implemented by Method Handles, > somehow, the method handles are not used for native methods, which are > generally linkable by method handles. This causes significant performance > regressions when

Re: RFR: 8343704: Bad GC parallelism with processing Cleaner queues [v8]

2024-11-15 Thread Francesco Nigro
On Fri, 15 Nov 2024 10:05:29 GMT, Aleksey Shipilev wrote: >> See the bug for more discussion and reproducer. This PR replaces the ad-hoc >> linked list segmented list of arrays. Arrays are easy targets for GC. There >> are possible improvements here, most glaring is parallelism that is >> curr

Re: RFR: 8343933: Add a MemorySegment::fill benchmark with varying sizes

2024-11-12 Thread Francesco Nigro
On Tue, 12 Nov 2024 10:03:50 GMT, Emanuel Peter wrote: >>> Thanks @minborg for this :) Please remember to add the misprediction count >>> if you can and avoid the bulk methods by having a `nextMemorySegment()` >>> benchmark method which make a single fill call site to observe the >>> different

Re: RFR: 8343933: Add a MemorySegment::fill benchmark with varying sizes [v2]

2024-11-12 Thread Francesco Nigro
On Mon, 11 Nov 2024 17:23:56 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains six additional >> c

Re: RFR: 8343933: Add a MemorySegment::fill benchmark with varying sizes [v2]

2024-11-11 Thread Francesco Nigro
On Mon, 11 Nov 2024 14:50:57 GMT, Per Minborg wrote: >> This PR proposes to add a new `MemorySegment::fill" benchmark where the byte >> size of the segments varies. > > Per Minborg has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes

Re: RFR: 8343933: Add a MemorySegment::fill benchmark with varying sizes

2024-11-11 Thread Francesco Nigro
On Mon, 11 Nov 2024 11:55:27 GMT, Per Minborg wrote: > This PR proposes to add a new `MemorySegment::fill" benchmark where the byte > size of the segments varies. Thanks @minborg for this :) Please remember to add the misprediction count if you can - or just avoid the bulk methods instead i.e.

Re: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3]

2024-10-04 Thread Francesco Nigro
On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarreño wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in >> order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of >> these calls because of

Re: RFR: 8338591: Improve performance of MemorySegment::copy

2024-09-12 Thread Francesco Nigro
On Thu, 12 Sep 2024 11:14:32 GMT, Per Minborg wrote: >>> Wdyt about the benchmark I have added at [#20829 >>> (comment)](https://github.com/openjdk/jdk/pull/20829#issuecomment-2326404582)? >>> Sadly I didn't yet run against this PR >> >> Let's see what we can do about this later. > >> Wdyt abo

Re: RFR: 8338591: Improve performance of MemorySegment::copy

2024-09-05 Thread Francesco Nigro
On Tue, 3 Sep 2024 07:52:44 GMT, Per Minborg wrote: > This PR proposes to handle smaller FFM copy operations with Java code rather > than transitioning to native code. This will improve performance. In this PR, > copy operations involving zero to 63 bytes will be handled by Java code. > > Here

Re: RFR: 8338591: Improve performance of MemorySegment::copy

2024-09-03 Thread Francesco Nigro
On Tue, 3 Sep 2024 15:44:34 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java >> line 642: >> >>> 640: // 0...0X00 >>> 641: if (remaining >= 4) { >>> 642: final int v = >>> SCOPED_MEMORY_A

Re: RFR: 8338591: Improve performance of MemorySegment::copy

2024-09-03 Thread Francesco Nigro
On Tue, 3 Sep 2024 07:52:44 GMT, Per Minborg wrote: > This PR proposes to handle smaller FFM copy operations with Java code rather > than transitioning to native code. This will improve performance. In this PR, > copy operations involving zero to 63 bytes will be handled by Java code. > > Here

Re: RFR: 8338591: Improve performance of MemorySegment::copy

2024-09-03 Thread Francesco Nigro
On Tue, 3 Sep 2024 07:52:44 GMT, Per Minborg wrote: > This PR proposes to handle smaller FFM copy operations with Java code rather > than transitioning to native code. This will improve performance. In this PR, > copy operations involving zero to 63 bytes will be handled by Java code. > > Here

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

2024-09-03 Thread Francesco Nigro
On Tue, 3 Sep 2024 08:39:02 GMT, Per Minborg wrote: >> I found similar small improvements to be had (I wrote about them offline) >> when replacing the bitwise-based tests (e.g. `foo & 4 != 0`) with a more >> explicit check for `remainingBytes >=4`. Seems like bitwise operations are >> not as o

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

2024-09-03 Thread Francesco Nigro
On Mon, 2 Sep 2024 09:36:40 GMT, Maurizio Cimadamore wrote: > This looks good. Again, the goal of this PR is not to squeeze every > nanosecond out - but, rather, to achieve a performance model that is > "sensible" fully agree and yep - this looks pretty good already, well done @minborg ! tha

Re: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

2024-09-03 Thread Francesco Nigro
On Tue, 27 Aug 2024 17:09:18 GMT, Galder Zamarreño wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in >> order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of >> these calls because of

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

2024-08-30 Thread Francesco Nigro
On Fri, 30 Aug 2024 10:51:59 GMT, Per Minborg wrote: >> The performance of the `MemorySegment::fil` can be improved by replacing the >> `checkAccess()` method call with calling `checkReadOnly()` instead (as the >> bounds of the segment itself do not need to be checked). >> >> Also, smaller seg

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v5]

2024-08-30 Thread Francesco Nigro
On Fri, 30 Aug 2024 15:21:52 GMT, Maurizio Cimadamore wrote: > in this case, we can't optimize as well, because we have different branches > which get taken or not in a less predictable fashion. Exactly - It has been designed to show the case when the conditions materialize (because are take

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v5]

2024-08-30 Thread Francesco Nigro
On Fri, 30 Aug 2024 12:15:36 GMT, Per Minborg wrote: >> @minborg Hi! I didn't checked the numbers with the benchmark I've written at >> https://github.com/openjdk/jdk/pull/20712#discussion_r1732802685 which is >> meant to stress the branch predictor (without enough `samples` i.e. past >> 128K

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v10]

2024-08-30 Thread Francesco Nigro
On Fri, 30 Aug 2024 10:51:59 GMT, Per Minborg wrote: >> The performance of the `MemorySegment::fil` can be improved by replacing the >> `checkAccess()` method call with calling `checkReadOnly()` instead (as the >> bounds of the segment itself do not need to be checked). >> >> Also, smaller seg

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v9]

2024-08-30 Thread Francesco Nigro
On Fri, 30 Aug 2024 09:09:57 GMT, Per Minborg wrote: >> The performance of the `MemorySegment::fil` can be improved by replacing the >> `checkAccess()` method call with calling `checkReadOnly()` instead (as the >> bounds of the segment itself do not need to be checked). >> >> Also, smaller seg

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v5]

2024-08-28 Thread Francesco Nigro
On Wed, 28 Aug 2024 09:06:48 GMT, Per Minborg wrote: >> How fast do we need to be here given we are measuring in a few nanoseconds >> per operation? >> >> What if the goal is not to regress from say explicitly filling in a small >> sized segment or a comparable array (e.g., < 8 bytes) then ma

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v4]

2024-08-27 Thread Francesco Nigro
On Tue, 27 Aug 2024 09:47:20 GMT, Per Minborg wrote: >> As discussed offline, can't we use a stable array of functions or something >> like that which can be populated lazily? That way you can access the >> function you want in a single array access, and we could put all these >> helper method

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v4]

2024-08-26 Thread Francesco Nigro
On Mon, 26 Aug 2024 12:11:29 GMT, Per Minborg wrote: >> The performance of the `MemorySegment::fil` can be improved by replacing the >> `checkAccess()` method call with calling `checkReadOnly()` instead (as the >> bounds of the segment itself do not need to be checked). >> >> Also, smaller seg

Re: RFR: 8338967: Improve performance for MemorySegment::fill [v3]

2024-08-26 Thread Francesco Nigro
On Mon, 26 Aug 2024 12:07:02 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add a comment about the old switch type > > Here is what it looks like for Windows x64: > > ![image](https://github.com/

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-06 Thread Francesco Nigro
On Sun, 7 Apr 2024 01:49:01 GMT, Dean Long wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Oops > > I went ahead and tried a pure-Java implementation, and it is faster for small > sizes (up to 8) and only about 1

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v14]

2024-03-23 Thread Francesco Nigro
On Sat, 23 Mar 2024 02:16:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v14]

2024-03-23 Thread Francesco Nigro
On Sat, 23 Mar 2024 02:16:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers

2023-10-16 Thread Francesco Nigro
On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen wrote: > @cl4es made performance optimizations for the simple specifiers of > String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the > same idea, I continued to make improvements. I made patterns like %2d %02d > also be optimize

Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v11]

2023-09-27 Thread Francesco Nigro
On Wed, 27 Sep 2023 09:35:47 GMT, 温绍锦 wrote: >> @cl4es made performance optimizations for the simple specifiers of >> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the >> same idea, I continued to make improvements. I made patterns like %2d %02d >> also be optimized. >

Re: RFR: 8291966: SwitchBootstrap.typeSwitch could be faster [v2]

2023-05-31 Thread Francesco Nigro
On Tue, 30 May 2023 09:32:02 GMT, Jan Lahoda wrote: >> @forax >> >> Hi! Sorry for this sudden message, but this one captured my attention >> >>> and subtype checks are usually fast. >> >> And I hope this PR to be the right place to raise this. >> >> I was looking this PR to better understand

Re: RFR: 8291966: SwitchBootstrap.typeSwitch could be faster [v2]

2023-05-29 Thread Francesco Nigro
On Tue, 2 May 2023 13:57:37 GMT, Rémi Forax wrote: >> Jan Lahoda has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains four additional >> commits sinc

Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Update StringLatin1.canEncode to sync with same test in CharacterData.of > > Just for fun, I tried with a benchmark

Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 13:42:22 GMT, Eirik Bjorsnos wrote: >> Can you check what happen adding much more inputs to the dataset including >> non-latin chars as well and use `-prof perfnorm` to check what `perf` report >> re branches/branch-misses? >> >> You can use `SplittableRandom` to pre-popula

Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Francesco Nigro
On Wed, 15 Mar 2023 12:28:05 GMT, Eirik Bjorsnos wrote: >>> `if (ch && 0xFF00 == 0) {` >> >> This seems to perform similar to baseline: >> >> >> Benchmark (codePoint) Mode Cnt Score Error Units >> Characters.isDigit 48 avgt 15 0.890 ± 0.025 ns/op >> Character

Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]

2023-02-08 Thread Francesco Nigro
On Wed, 8 Feb 2023 00:07:14 GMT, Claes Redestad wrote: >> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to >> clone arrays when `newLength` or range inputs span the input array. This >> helps eliminate range checks and has been verified to help various String >> ope

Re: RFR: 8301958: Reduce Arrays.copyOfRange overheads [v6]

2023-02-07 Thread Francesco Nigro
On Tue, 7 Feb 2023 22:43:15 GMT, Claes Redestad wrote: >> This adds a local, specialized `copyBytes` method to `String` that avoids >> certain redundant range checks and clamping that JIT has issues removing >> fully. >> >> This has a small but statistically significant effect on `String` >>

Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Francesco Nigro
On Tue, 7 Feb 2023 20:32:11 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/String.java line 698: >> >>> 696: } >>> 697: >>> 698: static byte[] copyBytes(byte[] bytes, int offset, int length) { >> >> Given that the stub generated for array copy seems highly dependen

Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Francesco Nigro
On Tue, 7 Feb 2023 15:25:05 GMT, Claes Redestad wrote: >> This adds a local, specialized `copyBytes` method to `String` that avoids >> certain redundant range checks and clamping that JIT has issues removing >> fully. >> >> This has a small but statistically significant effect on `String` >>