https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81479
Bug ID: 81479 Summary: By default, GCC emits a function call for complex multiplication Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: smcallis at gmail dot com Target Milestone: --- I've seen this in gcc 4.4.7, 4.7.4 4.8.4, 5.4.1, 6.3.0 and 7.1.0 When compiling some simple complex arithmetic: template <typename cx> void __attribute__((noinline)) benchcore(const std::vector<cx> &aa, const std::vector<cx> &bb, const std::vector<cx> &cc, std::vector<cx> &dd, cx uu, cx vv, size_t nn) { for (ssize_t ii=0; ii < nn; ii++) { dd[ii] = ( aa[ii]*uu + bb[ii]*vv + cc[ii] ); } } > g++ -I. test.cc -O3 -o test The assembly generated is very unfriendly, it just basically unconditionally branches to the __mulsc3 function every time. 0x0000000000402a78 <+104>: movss 0x4(%r12,%rbx,8),%xmm3 0x0000000000402a7f <+111>: movss (%r12,%rbx,8),%xmm2 0x0000000000402a85 <+117>: movss 0x18(%rsp),%xmm0 0x0000000000402a8b <+123>: movss 0x1c(%rsp),%xmm1 0x0000000000402a91 <+129>: callq 0x400af0 <__mulsc3@plt> 0x0000000000402a96 <+134>: movq %xmm0,0x28(%rsp) 0x0000000000402a9c <+140>: movss 0x14(%rsp),%xmm3 0x0000000000402aa2 <+146>: movss 0x28(%rsp),%xmm5 0x0000000000402aa8 <+152>: movss 0x2c(%rsp),%xmm4 0x0000000000402aae <+158>: movss 0x4(%rbp,%rbx,8),%xmm1 0x0000000000402ab4 <+164>: movss 0x0(%rbp,%rbx,8),%xmm0 0x0000000000402aba <+170>: movss 0x10(%rsp),%xmm2 0x0000000000402ac0 <+176>: movss %xmm5,0xc(%rsp) 0x0000000000402ac6 <+182>: movss %xmm4,0x8(%rsp) 0x0000000000402acc <+188>: callq 0x400af0 <__mulsc3@plt> 0x0000000000402ad1 <+193>: movq %xmm0,0x20(%rsp) 0x0000000000402ad7 <+199>: movss 0x8(%rsp),%xmm4 0x0000000000402add <+205>: movss 0xc(%rsp),%xmm5 0x0000000000402ae3 <+211>: addss 0x24(%rsp),%xmm4 0x0000000000402ae9 <+217>: addss 0x20(%rsp),%xmm5 0x0000000000402aef <+223>: addss 0x4(%r13,%rbx,8),%xmm4 0x0000000000402af6 <+230>: addss 0x0(%r13,%rbx,8),%xmm5 0x0000000000402afd <+237>: movss %xmm4,0x4(%r14,%rbx,8) 0x0000000000402b04 <+244>: movss %xmm5,(%r14,%rbx,8) Which then implement the spec in Annex G of the ANSI C spec. Though this isn't a "bug" per se, an very nice enhancement would be to recognize that one can compute the complex multiply first, then check the results for NaN and _then_ call a function to correct the results. This would allow the main multiplication code to be inlined, with the cost of 2 compares, an and a jump. The unlikely path of having to fix the result will almost never be called. This will make default complex multiplies without -fcx-limited-range on be much better by default, if not ideal.