[Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128

rguenth at gcc dot gnu.org Thu, 02 Apr 2020 00:30:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364


--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #2)
> (In reply to Richard Biener from comment #1)
> > Huh, looks like this is the (patched by us) memory copying done in
> > spec_qsort?
> 
> Yes
> 
> > I wonder if you can re-measure with our patching undone but then with
> > -fno-strict-aliasing (though I think that only was required with LTO).
> >
> 
> The difference indeed goes away :-/  The current code we're
> benchmarking (when not using LTO) is slower in both cases :-/

:/

What is the diff we are using?  IIRC spec_qsort contains special casing
for standard integer type sizes and my original patch simply removed all
that premature optimization and instead always uses the char copying loop
(which seems to be vectorized then).  Maybe we can resort to apply
-fno-strict-aliasing just to the qsort CU?  It wasn't intended to introduce
big differences compared to official runs...

> > How large are the objects sorted in mcf?
> 
> It's always pointers, 8 bytes.

OK, that would explain it then.

[Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128

Reply via email to