http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47167

--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2011-01-05 20:09:11 
UTC ---
(In reply to comment #3)
> > this could be the reason for slowdown.
> 
....
> 
> $ gcc -lm testcase2.s
> $ time ./a.out
> 
> real    0m4.239s
> user    0m4.234s
> sys    0m0.001s
> 
> The important change is the change of %xmm10 -> %xmm0 in the mulpd 
> instruction.
> The functionality of the test didn't change due to existing "movapd    %xmm0,
> %xmm10" at the top of the loop and added extra "movapd    %xmm10, %xmm0" 
> before
> the loop.
> 
> This all happens on SnB, the code is generated with -O2 only.
> 
> H.J., any ideas?

Some loop performance is very sensitive to code sizes.  This change

-    mulpd    %xmm10, %xmm2
+    mulpd    %xmm0, %xmm2

will impact loop size due to exta REX prefix.

Reply via email to