I checked: the mystery of fast JIT sorting is solved. It's not about memory access — C++ handles that very well. The key is function inlining. C++ does inline functions, but not recursive ones. JIT inlines recursive functions for specific cases — e.g., for 5 million elements. As an example: Java code where the warm-up calls sort(), but for a much smaller number than 5 million, and the 5-million-element sort happens only once. The result: worse 20%-25% than C++. Exactly the same as when I copied the class DualPivotQuicksort.java into my project for testing and wondered why it was no longer that fast.
```java import java.util.Arrays; import java.util.Random; import static java.lang.System.out; public class Main { public static void main(String[] args) { int nTrials = 50; long sum = 0; long best = Long.MAX_VALUE; long totalTime = 0; int testSize = 5_000_000; int warmupSize = 100; int nWarmupTrials = 6; Random rand = new Random(1234); for (int n = 0; n < nTrials; n++) { int size = n < nTrials-1?warmupSize:testSize; if (n < nWarmupTrials) System.out.print("warmup... "); System.out.printf("%d/%d%n", n, nTrials); int[] numbers = new int[size]; for (int i = 0; i < size; i++) { numbers[i] = rand.nextInt(); } long startTime = System.nanoTime(); Arrays.sort(numbers); long endTime = System.nanoTime(); long elapsed = endTime - startTime; for (int i = 0; i < size; i++) { sum += (long) i *numbers[i]; } //System.out.printf(" %f%n", (double) elapsed / 1e6); if (n < nTrials-1) continue; totalTime += elapsed; if (elapsed < best) best = elapsed; } int nEffectiveTrials = nTrials - nWarmupTrials; System.out.println("java.vm.version=" + System.getProperty("java.vm.version")); System.out.println("java.vm.name=" + System.getProperty("java.vm.name")); System.out.println("java.version=" + System.getProperty("java.version")); System.out.printf("(java) vector sort time %f ms%n", (double) best / 1e6); System.out.printf("nTrials=%d = %d (warmUp) + %d (effective)%n", nTrials, nWarmupTrials, nEffectiveTrials); System.out.printf("size=%d%n", testSize); out.println("\n\n" + sum); } } ``` output java.vm.version=24.0.1+9-30 java.vm.name=OpenJDK 64-Bit Server VM java.version=24.0.1 (java) vector sort time 432.156342 ms nTrials=50 = 6 (warmUp) + 44 (effective) size=5000000 Thanks,