http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59137
Bug ID: 59137 Summary: Miscompilation at -O1 on mips/mipsel Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: major Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: aurelien at aurel32 dot net Created attachment 31222 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31222&action=edit Testcase The attached code is miscompiled on MIPS at -O1 level, but works correctly at -O0 or -O2 level: $ gcc -Wall -Wextra gcc-mips-miscompilation-testcase.c -O0 $ ./a.out && echo $? 1 $ gcc -Wall -Wextra gcc-mips-miscompilation-testcase.c -O1 $ ./a.out && echo $? 0 $ gcc -Wall -Wextra gcc-mips-miscompilation-testcase.c -O2 $ ./a.out && echo $? 1 This happens with GCC 4.7.3, GCC 4.8.2 as well as with a snapshot from trunk from 20131021, but not with GCC 4.6 or older. This happens with the o32 and n32 ABIs, but not with the n64 ABI. Looking at the generated code: 004006f0 <fLlistp>: 4006f0: 3c1c0042 lui gp,0x42 4006f4: 279c88a0 addiu gp,gp,-30560 4006f8: 8f828024 lw v0,-32732(gp) 4006fc: 1082000c beq a0,v0,400730 <fLlistp+0x40> 400700: 8f828028 lw v0,-32728(gp) 400704: 0480000a bltz a0,400730 <fLlistp+0x40> 400708: 00000000 nop 40070c: 8c820000 lw v0,0(a0) 400710: 10400007 beqz v0,400730 <fLlistp+0x40> 400714: 8f828028 lw v0,-32728(gp) 400718: 8c820004 lw v0,4(a0) 40071c: 10400003 beqz v0,40072c <fLlistp+0x3c> 400720: 00000000 nop 400724: 081001cc j 400730 <fLlistp+0x40> 400728: 8f828028 lw v0,-32728(gp) 40072c: 8f828024 lw v0,-32732(gp) 400730: 24040001 li a0,1 400734: 8f83802c lw v1,-32724(gp) 400738: 03e00008 jr ra 40073c: ac640000 sw a0,0(v1) At address 0x4006f8, the return register v0 is loaded with the address of Cnil_body, to be able to do the comparison (x0 == &Cnil_body). Later at address 0x400700, it is reloaded with the address of Ct_body. This is done in the delay slot, so in the case the branch is taken or not (in our case it is not). Later at address 0x400704 the branch is taken, but the value of v0 is not anymore the address of Cnil_body, but Ct_body instead (from the delay slot). Note that there is a nop in the delay slow that can be used to load the correct value.