This is spinoff #1 of PR 17619: 
 
Take this simple piece of code: 
--------------------- 
float a[2],b[2];  
  
float foobar () {  
  return a[0] * b[0] 
    + a[1] * b[1];  
}  
--------------------- 
 
Compiled with  
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387 
we get this code: 
--------------------- 
        pushl   %ebp 
        movl    %esp, %ebp 
        flds    b 
        fmuls   a 
        flds    b+4 
        fmuls   a+4 
        faddp   %st, %st(1) 
        popl    %ebp 
        ret 
----------------------------- 
That's certainly optimal. 
 
On the other hand, if we let the compiler use sse registers as well (though 
we do not force it, we simply want the most efficient code), the code 
we get with flags 
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387,sse 
looks like this: 
----------------------------- 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $4, %esp 
        flds    b 
        fmuls   a 
        movss   b+4, %xmm0 
        mulss   a+4, %xmm0 
        movss   %xmm0, -4(%ebp) 
        flds    -4(%ebp) 
        faddp   %st, %st(1) 
        leave 
        ret 
--------------------------- 
The code is almost equivalent except for the fact that we have one 
stack push and pop more to satisfy the system ABI that return values 
are passed through st(0). 
 
In essence, the compiler should just generate the first code sequence, 
even if given the flag -mfpmath=387,sse. 
 
W.

-- 
           Summary: Inefficient code with -mfpmath=387,sse
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bangerth at dealii dot org
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766

Reply via email to