I checked: the mystery of fast JIT sorting is solved.
It's not about memory access — C++ handles that very well.
The key is function inlining. C++ does inline functions, but not recursive ones.
JIT inlines recursive functions for specific cases — e.g., for 5
million elements.
As an example: Java code where the warm-up calls sort(),
but for a much smaller number than 5 million,
and the 5-million-element sort happens only once.
The result: worse 20%-25% than C++.
Exactly the same as when I copied the class DualPivotQuicksort.java
into my project for testing
and wondered why it was no longer that fast.
```java
import java.util.Arrays;
import java.util.Random;
import static java.lang.System.out;
public class Main {
public static void main(String[] args) {
int nTrials = 50;
long sum = 0;
long best = Long.MAX_VALUE;
long totalTime = 0;
int testSize = 5_000_000;
int warmupSize = 100;
int nWarmupTrials = 6;
Random rand = new Random(1234);
for (int n = 0; n < nTrials; n++) {
int size = n < nTrials-1?warmupSize:testSize;
if (n < nWarmupTrials)
System.out.print("warmup... ");
System.out.printf("%d/%d%n", n, nTrials);
int[] numbers = new int[size];
for (int i = 0; i < size; i++) {
numbers[i] = rand.nextInt();
}
long startTime = System.nanoTime();
Arrays.sort(numbers);
long endTime = System.nanoTime();
long elapsed = endTime - startTime;
for (int i = 0; i < size; i++) {
sum += (long) i *numbers[i];
}
//System.out.printf(" %f%n", (double) elapsed / 1e6);
if (n < nTrials-1)
continue;
totalTime += elapsed;
if (elapsed < best) best = elapsed;
}
int nEffectiveTrials = nTrials - nWarmupTrials;
System.out.println("java.vm.version=" +
System.getProperty("java.vm.version"));
System.out.println("java.vm.name=" +
System.getProperty("java.vm.name"));
System.out.println("java.version=" +
System.getProperty("java.version"));
System.out.printf("(java) vector sort time %f ms%n", (double)
best / 1e6);
System.out.printf("nTrials=%d = %d (warmUp) + %d (effective)%n",
nTrials, nWarmupTrials, nEffectiveTrials);
System.out.printf("size=%d%n", testSize);
out.println("\n\n" + sum);
}
}
```
output
java.vm.version=24.0.1+9-30
java.vm.name=OpenJDK 64-Bit Server VM
java.version=24.0.1
(java) vector sort time 432.156342 ms
nTrials=50 = 6 (warmUp) + 44 (effective)
size=5000000
Thanks,