While looking at PR42722 I noticed that gcc generates awful code for a tail-call involving a trivial pass-through of a large struct parameter.
> cat bug1.c struct s1 { int x[16]; }; extern void g1(struct s1); void f1(struct s1 s1) { g1(s1); } struct s2 { int x[17]; }; extern void g2(struct s2); void f2(struct s2 s2) { g2(s2); } > gcc -O2 -fomit-frame-pointer -S bug1.c > cat bug1.s .file "bug1.c" .text .p2align 4,,15 .globl f1 .type f1, @function f1: subl $12, %esp addl $12, %esp jmp g1 .size f1, .-f1 .p2align 4,,15 .globl f2 .type f2, @function f2: subl $12, %esp movl $17, %ecx movl %edi, 8(%esp) leal 16(%esp), %edi movl %esi, 4(%esp) movl %edi, %esi rep movsl movl 4(%esp), %esi movl 8(%esp), %edi addl $12, %esp jmp g2 .size f2, .-f2 .ident "GCC: (GNU) 4.5.0 20100128 (experimental)" .section .note.GNU-stack,"",@progbits There are two problems with this code: 1. For the larger struct gcc generates a block copy with identical source and destination addresses, which amounts to a very slow NOP. 2. For the smaller struct gcc manages to eliminate the block copy, but it leaves pointless stack manipulation behind in the function (f1). However, gcc-4.3 generates no pointless stack manipulation: .globl f1 .type f1, @function f1: jmp g1 .size f1, .-f1 .ident "GCC: (GNU) 4.3.5 20100103 (prerelease)" so there's a code size and performance regression in 4.5/4.4. -- Summary: inefficient code for trivial tail-call with large struct parameter Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: mikpe at it dot uu dot se GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42909