https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70094
Bug ID: 70094 Summary: Missed optimization when passing a constant struct argument by value Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: robryk at gmail dot com Target Milestone: --- Function baz in the code listing below gets compiled into something that writes to the stack. This is unnecessary: one can just load the argument into rdi with movabs and get rid of stack adjustments and memory accesses: ---snip--- [robryk@sharya-rana gccbug]$ cat > bug.cc struct foo { int a; int b; int c; }; void bar(foo); void baz() { foo f; f.a = 1; f.b = 2; f.c = 3; bar(f); } [robryk@sharya-rana gccbug]$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /build/gcc-multilib/src/gcc-5.3.0/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release Thread model: posix gcc version 5.3.0 (GCC) [robryk@sharya-rana gccbug]$ g++ bug.cc -O2 -c -o bug.o [robryk@sharya-rana gccbug]$ objdump -d bug.o bug.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z3bazv>: 0: 48 83 ec 18 sub $0x18,%rsp 4: be 03 00 00 00 mov $0x3,%esi 9: c7 04 24 01 00 00 00 movl $0x1,(%rsp) 10: c7 44 24 04 02 00 00 movl $0x2,0x4(%rsp) 17: 00 18: 48 8b 3c 24 mov (%rsp),%rdi 1c: e8 00 00 00 00 callq 21 <_Z3bazv+0x21> 21: 48 83 c4 18 add $0x18,%rsp 25: c3 retq ---snip--- I've verified that Clang performs the optimization I was talking about and that, according to gcc.godbolt.org, a snapshot of gcc 6 misses this optimization too. For comparison, the code Clang produces (according to godbolt): movabsq $8589934593, %rdi # imm = 0x200000001 movl $3, %esi jmp bar(foo) # TAILCALL