Nitpick: a typo slipped into the comment — "regsiter" -> "register".
On Fri, 26 Jul 2024 at 16:18, Jeff Law <jeffreya...@gmail.com> wrote: > > pr116085 is a long standing (since late 2022) regression on the riscv > port. > > A patch introduced a pattern to avoid unnecessary extensions when doing > a min/max operation where one of the values is a 32 bit positive constant. > > > (define_insn_and_split "*minmax" > > [(set (match_operand:DI 0 "register_operand" "=r") > > (sign_extend:DI > > (subreg:SI > > (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 > > "register_operand" "r")) > > (match_operand:DI 2 > > "immediate_operand" "i")) > > 0))) > > (clobber (match_scratch:DI 3 "=&r")) > > (clobber (match_scratch:DI 4 "=&r"))] > > "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0" > > "#" > > "&& reload_completed" > > [(set (match_dup 3) (sign_extend:DI (match_dup 1))) > > (set (match_dup 4) (match_dup 2)) > > (set (match_dup 0) (<minmax_optab>:DI (match_dup 3) (match_dup 4)))] > > > Lots going on in here. The key is the nonconstant value is zero > extended from SI to DI in the original RTL and we know the constant > value is unchanged if we were to sign extend it from 32 to 64 bits. > > We change the extension of the nonconstant operand from zero to sign > extension. I'm pretty confident the goal there is take advantage of the > fact that SI values are kept sign extended and will often be optimized away. > > The problem occurs when the nonconstant operand has the SI sign bit set. > As an example: > > smax (0x8000000, 0x7) resulting in 0x80000000 > > The split RTL will generate > smax (sign_extend (0x80000000), 0x7)) > > smax (0xffffffff80000000, 0x7) resulting in 0x7 > > Opps. > > We really needed to change the opcode to umax for this transformation to > work. That's easy enough. But there's further improvements we can make. > > First the pattern is a define_and_split with a post-reload split > condition. It would be better implemented as a 4->3 define_split so > that the costing model just works. Second, if operands[1] is a suitably > promoted subreg, then we can elide the sign extension when we generate > the split code, so often it'll be a 4->2 split, again with the cost > model working with no adjustments needed. > > Tested on rv32 and rv64 in my tester. I'll wait for the pre-commit > tester to spin it as well. > > > > Jeff