https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85283
Bug ID: 85283 Summary: Generates 20 lines of assembly while only one assembly instruction is enough. Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mcccs at gmx dot com Target Milestone: --- GCC version: trunk/20180407 (also older versions) Target: x86_64-linux-gnu Compile options: -Ofast -mavx2 -mfma -Wall -Wextra -Wpedantic Build options: --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap --enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix --with-pkgversion=GCC-Explorer-Build The exact code (no #include s): typedef struct { float x, y; } Vec2; Vec2 vec2_add(Vec2 a, Vec2 b) { Vec2 out = {a.x + b.x, a.y + b.y}; return out; } Produced assembly with line numbers: 1 vec2_add: 2 vmovq rcx, xmm0 3 vmovq rsi, xmm1 ... 21 vmovq xmm0, QWORD PTR [rsp-24] 22 ret Expected assembly (as compiled by Clang 6.0 with -Ofast -mavx2 -mfma): 1 vec2_add: # @vec2_add 2 vaddps xmm0, xmm1, xmm0 3 ret (Yes, only three lines) ^^^^^^ (These can be experimented here: https://godbolt.org/g/tTwusV) See also (for other inefficiencies): https://godbolt.org/g/AtWNgf