https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85283

            Bug ID: 85283
           Summary: Generates 20 lines of assembly while only one assembly
                    instruction is enough.
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mcccs at gmx dot com
  Target Milestone: ---

GCC version: trunk/20180407 (also older versions)
Target: x86_64-linux-gnu
Compile options: -Ofast -mavx2 -mfma -Wall -Wextra -Wpedantic

Build options: --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu --disable-bootstrap --enable-multiarch --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --enable-clocale=gnu
--enable-languages=c,c++,fortran --enable-ld=yes --enable-gold=yes
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id
--enable-lto --enable-plugins --enable-threads=posix
--with-pkgversion=GCC-Explorer-Build 

The exact code (no #include s):
typedef struct {
  float x, y;
} Vec2;

Vec2 vec2_add(Vec2 a, Vec2 b) {
  Vec2 out = {a.x + b.x, 
              a.y + b.y};
  return out;
}

Produced assembly with line numbers:

1 vec2_add:
2  vmovq rcx, xmm0
3  vmovq rsi, xmm1
...
21 vmovq xmm0, QWORD PTR [rsp-24]
22 ret

Expected assembly (as compiled by Clang 6.0 with -Ofast -mavx2 -mfma):

1 vec2_add: # @vec2_add
2   vaddps xmm0, xmm1, xmm0
3   ret

(Yes, only three lines)

^^^^^^

(These can be experimented here: https://godbolt.org/g/tTwusV)

See also (for other inefficiencies): https://godbolt.org/g/AtWNgf

Reply via email to