Nitpick: a typo slipped into the comment — "regsiter" -> "register".


On Fri, 26 Jul 2024 at 16:18, Jeff Law <jeffreya...@gmail.com> wrote:
>
> pr116085 is a long standing (since late 2022) regression on the riscv
> port.
>
> A patch introduced a pattern to avoid unnecessary extensions when doing
> a min/max operation where one of the values is a 32 bit positive constant.
>
> > (define_insn_and_split "*minmax"
> >   [(set (match_operand:DI 0 "register_operand" "=r")
> >         (sign_extend:DI
> >           (subreg:SI
> >             (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 
> > "register_operand" "r"))
> >                                                 (match_operand:DI 2 
> > "immediate_operand" "i"))
> >            0)))
> >    (clobber (match_scratch:DI 3 "=&r"))
> >    (clobber (match_scratch:DI 4 "=&r"))]
> >   "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
> >   "#"
> >   "&& reload_completed"
> >   [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
> >    (set (match_dup 4) (match_dup 2))
> >    (set (match_dup 0) (<minmax_optab>:DI (match_dup 3) (match_dup 4)))]
>
>
> Lots going on in here.  The key is the nonconstant value is zero
> extended from SI to DI in the original RTL and we know the constant
> value is unchanged if we were to sign extend it from 32 to 64 bits.
>
> We change the extension of the nonconstant operand from zero to sign
> extension.  I'm pretty confident the goal there is take advantage of the
> fact that SI values are kept sign extended and will often be optimized away.
>
> The problem occurs when the nonconstant operand has the SI sign bit set.
>   As an example:
>
> smax (0x8000000, 0x7)  resulting in 0x80000000
>
> The split RTL will generate
> smax (sign_extend (0x80000000), 0x7))
>
> smax (0xffffffff80000000, 0x7) resulting in 0x7
>
> Opps.
>
> We really needed to change the opcode to umax for this transformation to
> work.  That's easy enough.  But there's further improvements we can make.
>
> First the pattern is a define_and_split with a post-reload split
> condition.  It would be better implemented as a 4->3 define_split so
> that the costing model just works.  Second, if operands[1] is a suitably
> promoted subreg, then we can elide the sign extension when we generate
> the split code, so often it'll be a 4->2 split, again with the cost
> model working with no adjustments needed.
>
> Tested on rv32 and rv64 in my tester.  I'll wait for the pre-commit
> tester to spin it as well.
>
>
>
> Jeff

Reply via email to