https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81479

            Bug ID: 81479
           Summary: By default, GCC emits a function call for complex
                    multiplication
           Product: gcc
           Version: 7.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: smcallis at gmail dot com
  Target Milestone: ---

I've seen this in gcc 4.4.7, 4.7.4 4.8.4, 5.4.1, 6.3.0 and 7.1.0

When compiling some simple complex arithmetic:

    template <typename cx>
    void __attribute__((noinline)) benchcore(const std::vector<cx> &aa, const 
    std::vector<cx> &bb, const std::vector<cx> &cc, std::vector<cx> &dd, cx uu,
cx vv, 
    size_t nn) {
       for (ssize_t ii=0; ii < nn; ii++) {
            dd[ii] = (
                aa[ii]*uu +
                bb[ii]*vv +
                cc[ii]
            );
        }
    }

> g++ -I. test.cc -O3 -o test

The assembly generated is very unfriendly, it just basically unconditionally
branches to the __mulsc3 function every time.

   0x0000000000402a78 <+104>:   movss  0x4(%r12,%rbx,8),%xmm3
   0x0000000000402a7f <+111>:   movss  (%r12,%rbx,8),%xmm2
   0x0000000000402a85 <+117>:   movss  0x18(%rsp),%xmm0
   0x0000000000402a8b <+123>:   movss  0x1c(%rsp),%xmm1
   0x0000000000402a91 <+129>:   callq  0x400af0 <__mulsc3@plt>
   0x0000000000402a96 <+134>:   movq   %xmm0,0x28(%rsp)
   0x0000000000402a9c <+140>:   movss  0x14(%rsp),%xmm3
   0x0000000000402aa2 <+146>:   movss  0x28(%rsp),%xmm5
   0x0000000000402aa8 <+152>:   movss  0x2c(%rsp),%xmm4
   0x0000000000402aae <+158>:   movss  0x4(%rbp,%rbx,8),%xmm1
   0x0000000000402ab4 <+164>:   movss  0x0(%rbp,%rbx,8),%xmm0
   0x0000000000402aba <+170>:   movss  0x10(%rsp),%xmm2
   0x0000000000402ac0 <+176>:   movss  %xmm5,0xc(%rsp)
   0x0000000000402ac6 <+182>:   movss  %xmm4,0x8(%rsp)
   0x0000000000402acc <+188>:   callq  0x400af0 <__mulsc3@plt>
   0x0000000000402ad1 <+193>:   movq   %xmm0,0x20(%rsp)
   0x0000000000402ad7 <+199>:   movss  0x8(%rsp),%xmm4
   0x0000000000402add <+205>:   movss  0xc(%rsp),%xmm5
   0x0000000000402ae3 <+211>:   addss  0x24(%rsp),%xmm4
   0x0000000000402ae9 <+217>:   addss  0x20(%rsp),%xmm5
   0x0000000000402aef <+223>:   addss  0x4(%r13,%rbx,8),%xmm4
   0x0000000000402af6 <+230>:   addss  0x0(%r13,%rbx,8),%xmm5
   0x0000000000402afd <+237>:   movss  %xmm4,0x4(%r14,%rbx,8)
   0x0000000000402b04 <+244>:   movss  %xmm5,(%r14,%rbx,8)

Which then implement the spec in Annex G of the ANSI C spec.  Though this isn't
a "bug" per se, an very nice enhancement would be to recognize that one can
compute the complex multiply first, then check the results for NaN and _then_
call a function to correct the results.  This would allow the main
multiplication code to be inlined, with the cost of 2 compares, an and a jump. 
The unlikely path of having to fix the result will almost never be called.  
This will make default complex multiplies without -fcx-limited-range on be much
better by default, if not ideal.

Reply via email to