On Tue, 7 Jan 2025 10:39:18 GMT, Shaojin Wen <s...@openjdk.org> wrote:

> In PR #22928, UUID introduced long-based vectorized hexadecimal to string 
> conversion, which can also be used in Integer::toHexString and 
> Long::toHexString to eliminate table lookups. The benefit of eliminating 
> table lookups is that the performance is better when cache misses occur.

The testing data from both aarch64 and x64 architectures indicates a 
performance improvement of 10% to 20%. However, under the MacBook M1 Pro 
environment, the performance enhancement for the Integer.toHexString scenario 
has reached 100%.

## 1. Script

git remote add wenshao g...@github.com:wenshao/jdk.git
git fetch wenshao

# baseline 91db7c0877a
git checkout 91db7c0877a68ad171da2b4501280fc24630ae83
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

 # current 1788d09787c
git checkout 1788d09787cadfe6ec23b9b10bef87a2cdc029a3
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"


## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units (baseline 
91db7c0877a)
-Integers.toHexString     500  avgt   15  4.855 ± 0.058  us/op
-Longs.toHexString        500  avgt   15  6.098 ± 0.034  us/op


+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 
1788d09787c)
+Integers.toHexString     500  avgt   15  4.105 ± 0.010  us/op +18.27%
+Longs.toHexString        500  avgt   15  4.682 ± 0.116  us/op +30.24%



## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.158 ± 0.025  us/op
-Longs.toHexString        500  avgt   15  6.072 ± 0.020  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  4.691 ± 0.024  us/op  +9.95%
+Longs.toHexString        500  avgt   15  4.947 ± 0.024  us/op +22.74%



## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.880 ± 0.017  us/op
-Longs.toHexString        500  avgt   15  7.183 ± 0.013  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  5.282 ± 0.012  us/op +11.32%
+Longs.toHexString        500  avgt   15  5.530 ± 0.013  us/op +29.89%




## 5. MacBook M1 Pro (aarch64)

-Benchmark             (size)  Mode  Cnt   Score   Error  Units (baseline 
91db7c0877a)
-Integers.toHexString     500  avgt   15  10.519 ? 1.573  us/op
-Longs.toHexString        500  avgt   15  5.754 ? 0.264  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 
1788d09787c)
+Integers.toHexString     500  avgt   15  5.057 ? 0.015  us/op +108.00%
+Longs.toHexString        500  avgt   15  5.147 ? 0.095  us/op  +11.79%

Because this algorithm underperforms compared to the original version when 
handling smaller numbers, I have marked this PR as draft. 

Additionally, this algorithm is used in another PR #22928 [Speed ​​up 
UUID::toString](https://github.com/openjdk/jdk/pull/22928) , and it still 
experiences performance degradation with Long.expand on older CPU architectures.


// Method 1:
i = Long.reverseBytes(Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL));

// Method 2:
i = ((i & 0xF0000000L) >> 28)
  | ((i & 0xF000000L) >> 16)
  | ((i & 0xF00000L) >> 4)
  | ((i & 0xF0000L) << 8)
  | ((i & 0xF000L) << 20)
  | ((i & 0xF00L) << 32)
  | ((i & 0xF0L) << 44)
  | ((i & 0xFL) << 56);


Note: Using Long.reverseBytes + Long.expand is faster on x64 and ARMv9.
However, on AArch64 with ARMv8, it will be slower compared to the manual 
unrolling shown in Method 2.
ARMv8 includes Apple M1/M2, AWS Graviton 3; ARMv9.0 includes Apple M3/M4, 
Aliyun Yitian 710.

I haven't tested this on older x64 CPUs, like the AMD ZEN1, but it's possible 
that they experience the same issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2576197320
PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2578863538

Reply via email to