https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |rtl-optimization
Ever confirmed|0 |1
Last reconfirmed| |2023-04-12
Status|UNCONFIRMED |NEW
CC| |iant at google dot com,
| |law at gcc dot gnu.org,
| |segher at gcc dot gnu.org
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the good case (with + 1) combine succeeds:
Trying 9 -> 11:
9: r55:HI=zero_extend(r54:QI)
REG_DEAD r54:QI
11: r52:HI=r55:HI*r56:HI
REG_DEAD r56:HI
REG_DEAD r55:HI
Successfully matched this instruction:
(set (reg:HI 52)
(mult:HI (zero_extend:HI (reg:QI 54))
(reg:HI 56 [ a ])))
allowing combination of insns 9 and 11
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 9.
modifying insn i3 11: r52:HI=zero_extend(r54:QI)*r56:HI
REG_DEAD r54:QI
REG_DEAD r56:HI
deferring rescan insn with uid = 11.
and then
Trying 10 -> 11:
10: r56:HI=zero_extend(r61:QI)
REG_DEAD r61:QI
11: r52:HI=zero_extend(r54:QI)*r56:HI
REG_DEAD r54:QI
REG_DEAD r56:HI
Successfully matched this instruction:
(set (reg:HI 52)
(mult:HI (zero_extend:HI (reg:QI 54))
(zero_extend:HI (reg:QI 61))))
allowing combination of insns 10 and 11
original costs 4 + 20 = 24
replacement cost 12
deferring deletion of insn with uid = 10.
modifying insn i3 11: r52:HI=zero_extend(r54:QI)*zero_extend(r61:QI)
REG_DEAD r61:QI
REG_DEAD r54:QI
deferring rescan insn with uid = 11.
in the bad case instead
Trying 8 -> 9:
8: r52:HI=zero_extend(r55:QI)
REG_DEAD r55:QI
9: r50:HI=r51:HI*r52:HI
REG_DEAD r52:HI
REG_DEAD r51:HI
Successfully matched this instruction:
(set (reg:HI 50)
(mult:HI (zero_extend:HI (reg:QI 55))
(reg:HI 51 [ b+1 ])))
allowing combination of insns 8 and 9
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 8.
modifying insn i3 9: r50:HI=zero_extend(r55:QI)*r51:HI
REG_DEAD r55:QI
REG_DEAD r51:HI
deferring rescan insn with uid = 9.
Trying 20 -> 9:
20: r51:HI#1=0
9: r50:HI=zero_extend(r55:QI)*r51:HI
REG_DEAD r55:QI
REG_DEAD r51:HI
Can't combine i2 into i3
that's because the RTL into combine in the bad case is
(insn 19 22 20 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
(expr_list:REG_DEAD (reg:QI 54 [ b+1 ])
(nil)))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
(nil))
(insn 8 20 9 2 (set (reg:HI 52 [ a ])
(zero_extend:HI (reg/v:QI 48 [ a ]))) "t.ii":4:49 635
{zero_extendqihi2}
(expr_list:REG_DEAD (reg/v:QI 48 [ a ])
(nil)))
(insn 9 8 14 2 (set (reg:HI 50)
(mult:HI (reg:HI 51 [ b+1 ])
(reg:HI 52 [ a ]))) "t.ii":4:47 328 {*mulhi3_enh_split}
(expr_list:REG_DEAD (reg:HI 52 [ a ])
(expr_list:REG_DEAD (reg:HI 51 [ b+1 ])
(nil))))
so the 'b' operand of the multiplication is now not a zero_extend:HI
but instead a two instruction set. The first subreg pass produces
this IL, turning
(insn 3 2 4 2 (set (reg/v:HI 49 [ b ])
(reg:HI 22 r22 [ b ])) "t.ii":3:49 101 {*movhi_split}
(nil))
(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (reg/v:HI 49 [ b ])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
(nil))
into
(insn 17 2 18 2 (set (reg:QI 53 [ b ])
(reg:QI 22 r22 [ b ])) "t.ii":3:49 86 {movqi_insn_split}
(nil))
(insn 18 17 4 2 (set (reg:QI 54 [ b+1 ])
(reg:QI 23 r23 [ b+1 ])) "t.ii":3:49 86 {movqi_insn_split}
(nil))
(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
(nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
(nil))
with -fno-split-wide-types the rotate doesn't get a zero_extend so the
multiplication pattern doesn't match either.
I think that possibly the lower subreg pass should more optimally
handle the situation, creating
(insn ... (set (zero_extend:HI (reg:QI 54 [ b + 1])))
here. I'm quite sure combine/forwprop cannot combine the seemingly
unrelated subreg sets. resolve_shift_zext seems to be supposed to
handle this and it receives
(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (concatn/v:HI [
(reg:QI 53 [ b ])
(reg:QI 54 [ b+1 ])
])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
(nil))
maybe for GET_CODE (op) != ASHIFTRT && offset1 == 0 && shift_count <=
BITS_PER_WORD this can be directly emitted as zero_extend (if supported
by the target).