https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |rtl-optimization Ever confirmed|0 |1 Last reconfirmed| |2023-04-12 Status|UNCONFIRMED |NEW CC| |iant at google dot com, | |law at gcc dot gnu.org, | |segher at gcc dot gnu.org --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- In the good case (with + 1) combine succeeds: Trying 9 -> 11: 9: r55:HI=zero_extend(r54:QI) REG_DEAD r54:QI 11: r52:HI=r55:HI*r56:HI REG_DEAD r56:HI REG_DEAD r55:HI Successfully matched this instruction: (set (reg:HI 52) (mult:HI (zero_extend:HI (reg:QI 54)) (reg:HI 56 [ a ]))) allowing combination of insns 9 and 11 original costs 4 + 28 = 32 replacement cost 20 deferring deletion of insn with uid = 9. modifying insn i3 11: r52:HI=zero_extend(r54:QI)*r56:HI REG_DEAD r54:QI REG_DEAD r56:HI deferring rescan insn with uid = 11. and then Trying 10 -> 11: 10: r56:HI=zero_extend(r61:QI) REG_DEAD r61:QI 11: r52:HI=zero_extend(r54:QI)*r56:HI REG_DEAD r54:QI REG_DEAD r56:HI Successfully matched this instruction: (set (reg:HI 52) (mult:HI (zero_extend:HI (reg:QI 54)) (zero_extend:HI (reg:QI 61)))) allowing combination of insns 10 and 11 original costs 4 + 20 = 24 replacement cost 12 deferring deletion of insn with uid = 10. modifying insn i3 11: r52:HI=zero_extend(r54:QI)*zero_extend(r61:QI) REG_DEAD r61:QI REG_DEAD r54:QI deferring rescan insn with uid = 11. in the bad case instead Trying 8 -> 9: 8: r52:HI=zero_extend(r55:QI) REG_DEAD r55:QI 9: r50:HI=r51:HI*r52:HI REG_DEAD r52:HI REG_DEAD r51:HI Successfully matched this instruction: (set (reg:HI 50) (mult:HI (zero_extend:HI (reg:QI 55)) (reg:HI 51 [ b+1 ]))) allowing combination of insns 8 and 9 original costs 4 + 28 = 32 replacement cost 20 deferring deletion of insn with uid = 8. modifying insn i3 9: r50:HI=zero_extend(r55:QI)*r51:HI REG_DEAD r55:QI REG_DEAD r51:HI deferring rescan insn with uid = 9. Trying 20 -> 9: 20: r51:HI#1=0 9: r50:HI=zero_extend(r55:QI)*r51:HI REG_DEAD r55:QI REG_DEAD r51:HI Can't combine i2 into i3 that's because the RTL into combine in the bad case is (insn 19 22 20 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 0) (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split} (expr_list:REG_DEAD (reg:QI 54 [ b+1 ]) (nil))) (insn 20 19 8 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 1) (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split} (nil)) (insn 8 20 9 2 (set (reg:HI 52 [ a ]) (zero_extend:HI (reg/v:QI 48 [ a ]))) "t.ii":4:49 635 {zero_extendqihi2} (expr_list:REG_DEAD (reg/v:QI 48 [ a ]) (nil))) (insn 9 8 14 2 (set (reg:HI 50) (mult:HI (reg:HI 51 [ b+1 ]) (reg:HI 52 [ a ]))) "t.ii":4:47 328 {*mulhi3_enh_split} (expr_list:REG_DEAD (reg:HI 52 [ a ]) (expr_list:REG_DEAD (reg:HI 51 [ b+1 ]) (nil)))) so the 'b' operand of the multiplication is now not a zero_extend:HI but instead a two instruction set. The first subreg pass produces this IL, turning (insn 3 2 4 2 (set (reg/v:HI 49 [ b ]) (reg:HI 22 r22 [ b ])) "t.ii":3:49 101 {*movhi_split} (nil)) (insn 7 4 8 2 (set (reg:HI 51) (lshiftrt:HI (reg/v:HI 49 [ b ]) (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3} (nil)) into (insn 17 2 18 2 (set (reg:QI 53 [ b ]) (reg:QI 22 r22 [ b ])) "t.ii":3:49 86 {movqi_insn_split} (nil)) (insn 18 17 4 2 (set (reg:QI 54 [ b+1 ]) (reg:QI 23 r23 [ b+1 ])) "t.ii":3:49 86 {movqi_insn_split} (nil)) (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0) (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split} (nil)) (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1) (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split} (nil)) with -fno-split-wide-types the rotate doesn't get a zero_extend so the multiplication pattern doesn't match either. I think that possibly the lower subreg pass should more optimally handle the situation, creating (insn ... (set (zero_extend:HI (reg:QI 54 [ b + 1]))) here. I'm quite sure combine/forwprop cannot combine the seemingly unrelated subreg sets. resolve_shift_zext seems to be supposed to handle this and it receives (insn 7 4 8 2 (set (reg:HI 51) (lshiftrt:HI (concatn/v:HI [ (reg:QI 53 [ b ]) (reg:QI 54 [ b+1 ]) ]) (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3} (nil)) maybe for GET_CODE (op) != ASHIFTRT && offset1 == 0 && shift_count <= BITS_PER_WORD this can be directly emitted as zero_extend (if supported by the target).