Use `JavaLangAccess::uncheckedNewStringNoRepl` in `MemorySegment::getString` to avoid byte[] allocation in the String constructor.
Fall back to the old code in the case of malformed input to get replacement characters as per Javadoc API specification. The existing tests in `TestStringEncoding` seem sufficient to me. I ran the tier1 test suite and it passes. For performance analysis I ran. make test TEST="micro:org.openjdk.bench.java.lang.foreign.ToJavaStringTest" MICRO="OPTIONS=-prof gc" on an AMD Ryzen 7 PRO 5750GE. These are the formatted results, the current master is the line on top, this feature branch is the line below. We can see an improvement in throughput driven by a reduction in allocation. Benchmark (size) Mode Cnt Score Error Units ToJavaStringTest.panama_readString 5 avgt 30 18.996 ± 0.044 ns/op ToJavaStringTest.panama_readString 5 avgt 30 13.851 ± 0.028 ns/op ToJavaStringTest.panama_readString 20 avgt 30 23.570 ± 0.050 ns/op ToJavaStringTest.panama_readString 20 avgt 30 18.401 ± 0.069 ns/op ToJavaStringTest.panama_readString 100 avgt 30 32.094 ± 0.207 ns/op ToJavaStringTest.panama_readString 100 avgt 30 24.427 ± 0.112 ns/op ToJavaStringTest.panama_readString 200 avgt 30 43.029 ± 0.185 ns/op ToJavaStringTest.panama_readString 200 avgt 30 31.914 ± 0.064 ns/op ToJavaStringTest.panama_readString 451 avgt 30 81.145 ± 0.403 ns/op ToJavaStringTest.panama_readString 451 avgt 30 58.975 ± 0.233 ns/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 5 avgt 30 72.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 5 avgt 30 48.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 20 avgt 30 104.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 20 avgt 30 64.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 100 avgt 30 264.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 100 avgt 30 144.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 200 avgt 30 456.001 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 200 avgt 30 240.000 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 451 avgt 30 968.001 ± 0.001 B/op ToJavaStringTest.panama_readString:gc.alloc.rate.norm 451 avgt 30 496.001 ± 0.001 B/op I looked into whether there are inlining issues with the current or proposed code. For this I ran var segment = MemorySegment.ofArray(new byte[] {'c', 'o', 'f', 'f', 'e', 'e', ' ', 'b', 'a', 'b', 'e', 0}); for (int i = 0; i < 200_000_000; i++) { var string = segment.getString(0); if (System.identityHashCode(string) == 1) { System.out.println("x"); } } with `-XX:+PrintCompilation -XX:+PrintInlining -XX:+UnlockDiagnosticVMOptions ` for the current master the simplified output is like this @ 75 jdk.internal.foreign.AbstractMemorySegmentImpl::getString (9 bytes) force inline by annotation callee changed to help.GetString::main (107 bytes) -> TypeProfile (95646/95646 counts) = jdk/internal/foreign/HeapMemorySegmentImpl$OfByte @ 5 jdk.internal.foreign.AbstractMemorySegmentImpl::getString (12 bytes) force inline by annotation @ 1 java.util.Objects::requireNonNull (14 bytes) force inline by annotation @ 8 jdk.internal.foreign.StringSupport::read (67 bytes) force inline by annotation @ 1 jdk.internal.foreign.StringSupport$CharsetKind::of (102 bytes) inline (hot) @ 4 java.lang.Enum::ordinal (5 bytes) accessor @ 45 jdk.internal.foreign.StringSupport::readByte (41 bytes) force inline by annotation @ 3 jdk.internal.foreign.AbstractMemorySegmentImpl::byteSize (5 bytes) accessor @ 6 jdk.internal.foreign.StringSupport::strlenByte (232 bytes) force inline by annotation <snip> @ 27 java.lang.foreign.MemorySegment::copy (29 bytes) force inline by annotation <snip> @ 37 java.lang.String::<init> (16 bytes) inline (hot) @ 2 java.util.Objects::requireNonNull (14 bytes) force inline by annotation @ 12 java.lang.String::<init> (86 bytes) inline (hot) @ 23 java.lang.String::utf8 (271 bytes) inline (hot) @ 9 java.lang.StringCoding::countPositives (33 bytes) (intrinsic) @ 27 java.util.Arrays::copyOfRange (82 bytes) inline (hot) @ 11 java.lang.Object::clone (0 bytes) failed to inline: native method (intrinsic) @ 31 java.lang.String::<init> (15 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 82 java.lang.String::<init> (37 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) As we can see everything is inlined but since `String::utf8` explicitly asks for a copy of the `byte[]` https://github.com/openjdk/jdk/blob/3263361a28c7e8c02734cb94bc9576e9f3ba5b50/src/java.base/share/classes/java/lang/String.java#L575 this can not be optimized away. This is the compliation log with the proposed changes @ 75 jdk.internal.foreign.AbstractMemorySegmentImpl::getString (9 bytes) force inline by annotation callee changed to help.GetString::main (107 bytes) -> TypeProfile (121249/121249 counts) = jdk/internal/foreign/HeapMemorySegmentImpl$OfByte @ 5 jdk.internal.foreign.AbstractMemorySegmentImpl::getString (12 bytes) force inline by annotation @ 1 java.util.Objects::requireNonNull (14 bytes) force inline by annotation @ 8 jdk.internal.foreign.StringSupport::read (67 bytes) force inline by annotation @ 1 jdk.internal.foreign.StringSupport$CharsetKind::of (102 bytes) inline (hot) @ 4 java.lang.Enum::ordinal (5 bytes) accessor @ 45 jdk.internal.foreign.StringSupport::readByte (55 bytes) force inline by annotation @ 3 jdk.internal.foreign.AbstractMemorySegmentImpl::byteSize (5 bytes) accessor @ 6 jdk.internal.foreign.StringSupport::strlenByte (232 bytes) force inline by annotation <snip> @ 27 java.lang.foreign.MemorySegment::copy (29 bytes) force inline by annotation <snip> @ 36 java.lang.System$1::uncheckedNewStringNoRepl (6 bytes) inline (hot) @ 2 java.lang.String::newStringNoRepl (33 bytes) inline (hot) @ 2 java.lang.String::newStringNoRepl1 (284 bytes) inline (hot) @ 22 java.lang.String::newStringUTF8NoRepl (322 bytes) inline (hot) @ 4 java.lang.String::checkBoundsOffCount (10 bytes) inline (hot) @ 6 jdk.internal.util.Preconditions::checkFromIndexSize (25 bytes) inline (hot) @ 24 java.lang.StringCoding::countPositives (33 bytes) (intrinsic) @ 73 java.lang.String::<init> (15 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) everything still inlines even though some methods are larger. ------------- Commit messages: - 8362893: Improve performance for MemorySegment::getString Changes: https://git.openjdk.org/jdk/pull/26493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26493&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8362893 Stats: 19 lines in 1 file changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26493.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26493/head:pull/26493 PR: https://git.openjdk.org/jdk/pull/26493