On Tue, 7 Jan 2025 10:39:18 GMT, Shaojin Wen <s...@openjdk.org> wrote:
> In PR #22928, UUID introduced long-based vectorized hexadecimal to string > conversion, which can also be used in Integer::toHexString and > Long::toHexString to eliminate table lookups. The benefit of eliminating > table lookups is that the performance is better when cache misses occur. I think that without proper assembly analysis won't be easy to check why... And yes, pdep is bad in old Ryzen @SirYwell :"( It could be either a branch prediction problem too (perfnorm would help) if the list of longs can produce small/big hex strings src/java.base/share/classes/jdk/internal/util/HexDigits.java line 199: > 197: > 198: /** > 199: * Extract the least significant 8 bytes from the input integer i, > convert each byte into its corresponding 2-digit The least significant 4 bytes src/java.base/share/classes/jdk/internal/util/HexDigits.java line 204: > 202: */ > 203: public static long hex8(long i) { > 204: long x = Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL); x86 should use pepd - but aarch64? src/java.base/share/classes/jdk/internal/util/HexDigits.java line 228: > 226: return ((m << 1) + (m >> 1) - (m >> 4)) > 227: + 0x3030_3030_3030_3030L > 228: + (x & 0x0F0F_0F0F_0F0F_0F0FL); x is already expanded at 0x0F0F_0F0F_0F0F_0F0FL, why & it again? Another thing: IDK how C2 does math here, but on the assembly it should be straightforward to check if we have some register data dep while performing these series of addition/subtraction. Usually x86 is more affected by this since it has less register available ------------- PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2577178403 PR Review Comment: https://git.openjdk.org/jdk/pull/22942#discussion_r1906110618 PR Review Comment: https://git.openjdk.org/jdk/pull/22942#discussion_r1906104894 PR Review Comment: https://git.openjdk.org/jdk/pull/22942#discussion_r1906103700