http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
--- Comment #64 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Laurent Aflonsi from comment #61) > > The movt(L2) and the tst(L3) are both removed, and that's coherent for that > run path, because it is preceded by the tst r2,r2. > But that makes the first path incoherent because L3 can be reached by the > very first block. I have written a first fix, too restrictive ("pr25869-19.c > scan-assembler-not movt" is failing) : > > --- ./gcc/gcc/config/sh/sh.md.orig > +++ ./gcc/gcc/config/sh/sh.md > @@ -8523,7 +8523,8 @@ > T bit. Notice that some T bit stores such as negc also modify > the T bit. */ > if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn) > - || modified_in_p (get_t_reg_rtx (), s1.insn)) > + || modified_in_p (get_t_reg_rtx (), s1.insn) > + || !no_labels_between_p(s1.insn, testing_insn)) > operands[2] = NULL_RTX; > > break; > > The idea would be to check if "s1.insn block dominates testing_insn block", > but I don't know how to write it at this stage. The proper way would be to find all basic blocks that set the tested reg. With the reduced test case, just right before the split1 pass there are two basic blocks that set reg 167 which is then tested for '== 0' before the conditional branch: (note 13 12 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK) <...> (insn 15 14 16 3 (set (reg:SI 147 t) (eq:SI (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ]) (const_int 0 [0]))) sh_tmp.cpp:84 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ]) (nil))) (insn 16 15 17 3 (set (reg:SI 175) (const_int -1 [0xffffffffffffffff])) sh_tmp.cpp:84 250 {movsi_ie} (nil)) (note 17 16 18 3 NOTE_INSN_DELETED) (insn 18 17 71 3 (parallel [ (set (reg:SI 167 [ D.1424 ]) (xor:SI (reg:SI 147 t) (const_int 1 [0x1]))) (set (reg:SI 147 t) (const_int 1 [0x1])) (use (reg:SI 175)) ]) sh_tmp.cpp:84 394 {movrt_negc} (expr_list:REG_DEAD (reg:SI 175) (expr_list:REG_UNUSED (reg:SI 147 t) (nil)))) (jump_insn 71 18 72 3 (set (pc) (label_ref 27)) -1 (nil) -> 27) (barrier 72 71 21) (code_label 21 72 22 4 2 "" [1 uses]) (note 22 21 23 4 [bb 4] NOTE_INSN_BASIC_BLOCK) <...> (insn 24 23 26 4 (set (reg:SI 147 t) (eq:SI (reg:SI 177 [ *q_3(D) ]) (const_int 0 [0]))) sh_tmp.cpp:85 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 177 [ *q_3(D) ]) (nil))) (insn 26 24 27 4 (set (reg:SI 167 [ D.1424 ]) (reg:SI 147 t)) sh_tmp.cpp:85 392 {movt} (expr_list:REG_DEAD (reg:SI 147 t) (nil))) (code_label 27 26 28 5 3 "" [1 uses]) (note 28 27 29 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 29 28 30 5 (set (reg:SI 147 t) (eq:SI (reg:SI 167 [ D.1424 ]) (const_int 0 [0]))) sh_tmp.cpp:91 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 167 [ D.1424 ]) (nil))) (jump_insn 30 29 31 5 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0 [0])) (label_ref:SI 50) (pc))) sh_tmp.cpp:91 295 {*cbranch_t} (expr_list:REG_DEAD (reg:SI 147 t) (expr_list:REG_BR_PROB (const_int 400 [0x190]) (nil))) -> 50) Here it starts walking up the insns from insn 29 [bb 5] and finds insn 26 [bb 4], but it should also check [bb 3]. The question then is, what to do with the collected basic blocks. Ideally it should look at all the T bit paths in every basic block and try to eliminate redundant T bit flipping in each basic block so that in this case [bb 5] can start with the conditional branch. Then this ... mov.l @(4,r4),r1 tst r1,r1 // T = @(4,r4) == 0 mov #-1,r1 negc r1,r1 // r1 = @(4,r4) != 0 .L3: tst r1,r1 // T = @(4,r4) == 0 bt/s .L5 mov #1,r1 cmp/hi r1,r5 bf/s .L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 // T = @(r4) == 0 bra .L3 movt r1 // r1 = @(r4) == 0 would be simplified to this: mov.l @(4,r4),r1 tst r1,r1 // T = @(4,r4) == 0 .L3: bt/s .L5 mov #1,r1 cmp/hi r1,r5 bf/s .L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 bra .L3 tst r1,r1 // T = @(r4) == 0 Maybe if BImode was used for the T bit, combine could do better at folding T bit flipping. However, it would not do cross BB analysis, so I think it's pointless to try out BImode. I'm not sure whether there is already something in the compiler that could do this kind of optimization. According to my observations it should happen after the combine pass and before register allocation to get useful results. Until then I think the following should be applied to 4.9 and 4.8, even if it causes some of the T bit test cases to fail. Index: gcc/config/sh/sh.md =================================================================== --- gcc/config/sh/sh.md (revision 201282) +++ gcc/config/sh/sh.md (working copy) @@ -8489,15 +8489,30 @@ continue; } - /* It's only safe to remove the testing insn if the T bit is not - modified between the testing insn and the insn that stores the - T bit. Notice that some T bit stores such as negc also modify - the T bit. */ - if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn) - || modified_in_p (get_t_reg_rtx (), s1.insn)) - operands[2] = NULL_RTX; + /* It's only safe to remove the testing insn if the T bit is not + modified between the testing insn and the insn that stores the + T bit. Notice that some T bit stores such as negc also modify + the T bit. */ + if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn) + || modified_in_p (get_t_reg_rtx (), s1.insn) + || !no_labels_between_p (s1.insn, testing_insn)) + operands[2] = NULL_RTX; + else + { + /* If the insn that sets the tested reg has a REG_DEAD note on + the T bit remove that note since we're extending the usage + of the T bit. */ + for (rtx n = REG_NOTES (s1.insn); n != NULL_RTX; ) + { + rtx nn = XEXP (n, 1); + if (REG_NOTE_KIND (n) == REG_DEAD + && t_reg_operand (XEXP (n, 0), VOIDmode)) + remove_note (s1.insn, n); + n = nn; + } + } - break; + break; } if (operands[2] == NULL_RTX)