https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64479
Bug ID: 64479 Summary: wrong optimization delayed-branch for SH Product: gcc Version: 4.8.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: oshima...@yagoto-urayama.jp On NetBSD/landisk(sh3) 7.0_BETA, gcc4.8.4 generates some strange code. Test case: int foo(int); void bar(int *); void func(int args) { int key; int flag = (args==0); bar(&key); if (!args) foo(1); if (foo(args)) { if (flag && key == 0) foo(2); else if (flag) foo(3); } } it generated by compile with -Os -S: func: .LFB0: .cfi_startproc mov.l r8,@-r15 .cfi_def_cfa_offset 4 .cfi_offset 8, -4 mov.l r9,@-r15 .cfi_def_cfa_offset 8 .cfi_offset 9, -8 mov.l r10,@-r15 .cfi_def_cfa_offset 12 .cfi_offset 10, -12 mov.l r11,@-r15 .cfi_def_cfa_offset 16 .cfi_offset 11, -16 sts.l pr,@-r15 .cfi_def_cfa_offset 20 .cfi_offset 17, -20 add #-4,r15 .cfi_def_cfa_offset 24 mov.l .L16,r1 tst r4,r4 movt r8 # T bit -> R8 mov r4,r9 jsr @r1 # subroutine call (bar) mov r15,r4 mov.l .L18,r11 bf .L2 # what condition? (T bit == 0?) jsr @r11 mov #1,r4 .L2 When compiled with -Os -fno-delayed-branch -S: func: .LFB0: .cfi_startproc mov.l r8,@-r15 .cfi_def_cfa_offset 4 .cfi_offset 8, -4 mov.l r9,@-r15 .cfi_def_cfa_offset 8 .cfi_offset 9, -8 mov.l r10,@-r15 .cfi_def_cfa_offset 12 .cfi_offset 10, -12 mov.l r11,@-r15 .cfi_def_cfa_offset 16 .cfi_offset 11, -16 sts.l pr,@-r15 .cfi_def_cfa_offset 20 .cfi_offset 17, -20 add #-4,r15 .cfi_def_cfa_offset 24 mov.l .L16,r1 tst r4,r4 movt r8 # T bit -> R8 mov r4,r9 mov r15,r4 jsr @r1 # subroutine call (bar) nop tst r8,r8 # if (R8 == 0) then T bit = 1 mov.l .L18,r11 bt .L2 # if (T bit == 1) branch .L2 mov #1,r4 jsr @r11 nop .L2: With -Os (include -fdelayed-branch), tst r8,r8 instruction is omitted, and branch condition is inverted. The same problem in this case occurs in flow_loops_find() function of gcc/cfgloop.c when building gcc 4.8.4 for NetBSD/sh3-binary. So gcc native-binary for sh3 will be wrong.