https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Martin Jambor from comment #2) > (In reply to Richard Biener from comment #1) > > Huh, looks like this is the (patched by us) memory copying done in > > spec_qsort? > > Yes > > > I wonder if you can re-measure with our patching undone but then with > > -fno-strict-aliasing (though I think that only was required with LTO). > > > > The difference indeed goes away :-/ The current code we're > benchmarking (when not using LTO) is slower in both cases :-/ :/ What is the diff we are using? IIRC spec_qsort contains special casing for standard integer type sizes and my original patch simply removed all that premature optimization and instead always uses the char copying loop (which seems to be vectorized then). Maybe we can resort to apply -fno-strict-aliasing just to the qsort CU? It wasn't intended to introduce big differences compared to official runs... > > How large are the objects sorted in mcf? > > It's always pointers, 8 bytes. OK, that would explain it then.