I checked: the mystery of fast JIT sorting is solved.
It's not about memory access — C++ handles that very well.
The key is function inlining. C++ does inline functions, but not recursive ones.
JIT inlines recursive functions for specific cases — e.g., for 5
million elements.
As an example: Java code where the warm-up calls sort(),
but for a much smaller number than 5 million,
and the 5-million-element sort happens only once.
The result: worse 20%-25% than C++.
Exactly the same as when I copied the class DualPivotQuicksort.java
into my project for testing
and wondered why it was no longer that fast.

```java
import java.util.Arrays;
import java.util.Random;

import static java.lang.System.out;

public class Main {
    public static void main(String[] args) {
        int nTrials = 50;
        long sum = 0;
        long best = Long.MAX_VALUE;
        long totalTime = 0;
        int testSize = 5_000_000;
        int warmupSize = 100;
        int nWarmupTrials = 6;
        Random rand = new Random(1234);
        for (int n = 0; n < nTrials; n++) {
            int size = n < nTrials-1?warmupSize:testSize;
            if (n < nWarmupTrials)
                System.out.print("warmup... ");
            System.out.printf("%d/%d%n", n, nTrials);
            int[] numbers = new int[size];
            for (int i = 0; i < size; i++) {
                numbers[i] = rand.nextInt();
            }
            long startTime = System.nanoTime();
            Arrays.sort(numbers);
            long endTime = System.nanoTime();
            long elapsed = endTime - startTime;
            for (int i = 0; i < size; i++) {
                sum += (long) i *numbers[i];
            }
            //System.out.printf("  %f%n", (double) elapsed / 1e6);
            if (n < nTrials-1)
                continue;
            totalTime += elapsed;
            if (elapsed < best) best = elapsed;
        }
        int nEffectiveTrials = nTrials - nWarmupTrials;
        System.out.println("java.vm.version=" +
System.getProperty("java.vm.version"));
        System.out.println("java.vm.name=" +
System.getProperty("java.vm.name"));
        System.out.println("java.version=" +
System.getProperty("java.version"));
        System.out.printf("(java) vector sort time %f ms%n", (double)
best / 1e6);
        System.out.printf("nTrials=%d = %d (warmUp) + %d (effective)%n",
                nTrials, nWarmupTrials, nEffectiveTrials);
        System.out.printf("size=%d%n", testSize);
         out.println("\n\n" + sum);
    }
}

```
output
java.vm.version=24.0.1+9-30
java.vm.name=OpenJDK 64-Bit Server VM
java.version=24.0.1
(java) vector sort time 432.156342 ms
nTrials=50 = 6 (warmUp) + 44 (effective)
size=5000000

Thanks,

Reply via email to