http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35926
--- Comment #5 from Tony Poppleton <tony.poppleton at gmail dot com> 2011-01-25 19:33:20 UTC --- I can confirm this still exists on both GCC 4.5.1 and GCC 4.6.0 (20110115), when compiling with -O3. I did some basic investigation into the files produced by the --dump-tree-all flag, which shows: add (struct toto_s * a, struct toto_s * b) { int64_t tmp; int D.2686; struct toto_s * D.2685; long long int D.2684; int D.2683; int b.1; long long int D.2681; int a.0; <bb 2>: a.0_2 = (int) a_1(d); D.2681_3 = (long long int) a.0_2; b.1_5 = (int) b_4(d); D.2683_6 = b.1_5 & -2; D.2684_7 = (long long int) D.2683_6; tmp_8 = D.2681_3 + D.2684_7; D.2686_9 = (int) tmp_8; D.2685_10 = (struct toto_s *) D.2686_9; return D.2685_10; } What I don't understand here, is the excessive casting; The addition in tmp_8 is being done as (long long int), yet both terms are ultimiately derived from (int) variables. Is this casting to (long long int) necessary to deal with an overflow on the addition? If so, then why does the final asm code not appear to be catering for overflow? Alternatively, could the whole block be simplified down to (int) during this phase of the compile, thereby fixing the subsequent unnecessary usage of BX during the RTL phase (as per comment #3)? As an aside (possibly another bug report?), it appears there is a regression in 4.6.0, which requires the use of an additional movl compared to what is in the original bug description (4.5.1 does not suffer from this) .file "PR35926.c" .text .p2align 4,,15 .globl add .type add, @function add: .LFB0: .cfi_startproc pushl %ebx .cfi_def_cfa_offset 8 .cfi_offset 3, -8 movl 12(%esp), %eax movl 8(%esp), %ecx popl %ebx .cfi_def_cfa_offset 4 .cfi_restore 3 andl $-2, %eax addl %eax, %ecx <==== order of regs inverted movl %ecx, %eax <==== resulting in unnecessary movl ret .cfi_endproc .LFE0: .size add, .-add .ident "GCC: (GNU) 4.6.0 20110115 (experimental)" .section .note.GNU-stack,"",@progbits