Using the same pseudo register makes one longer live range instead of two shorter live ranges, that's not good when inst. scheduler try to separate those two instructions, and I think register allocator has more complete knowledge to decide which way is better - using the same or different, so I prefer to use another pseudo here if possible.
That's also what AArch64/ARM/x86 port did - use new pseudo as tmp if possible. On Tue, Jun 7, 2022 at 6:50 PM Philipp Tomsich <philipp.toms...@vrull.eu> wrote: > > On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.ch...@gmail.com> wrote: > > > > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich > > <philipp.toms...@vrull.eu> wrote: > > > > > > The current method of treating shifts of extended values on RISC-V > > > frequently causes sequences of 3 shifts, despite the presence of the > > > 'zero_extendsidi2_shifted' pattern. > > > > > > Consider: > > > unsigned long f(unsigned int a, unsigned long b) > > > { > > > a = a << 1; > > > unsigned long c = (unsigned long) a; > > > c = b + (c<<4); > > > return c; > > > } > > > which will present at combine-time as: > > > Trying 7, 8 -> 9: > > > 7: r78:SI=r81:DI#0<<0x1 > > > REG_DEAD r81:DI > > > 8: r79:DI=zero_extend(r78:SI) > > > REG_DEAD r78:SI > > > 9: r72:DI=r79:DI<<0x4 > > > REG_DEAD r79:DI > > > Failed to match this instruction: > > > (set (reg:DI 72 [ _1 ]) > > > (and:DI (ashift:DI (reg:DI 81) > > > (const_int 5 [0x5])) > > > (const_int 68719476704 [0xfffffffe0]))) > > > and produce the following (optimized) assembly: > > > f: > > > slliw a5,a0,1 > > > slli a5,a5,32 > > > srli a5,a5,28 > > > add a0,a5,a1 > > > ret > > > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > > doesn't apply for two reasons: > > > - this is seen before reload, and > > > - (more importantly) the constant mask is not 0xfffffffful. > > > > > > To address this, we introduce a generalized version of shifting > > > zero-extended values that supports any mask of consecutive ones as > > > long as the number of training zeros is the inner shift-amount. > > > > > > With this new split, we generate the following assembly for the > > > aforementioned function: > > > f: > > > slli a0,a0,33 > > > srli a0,a0,28 > > > add a0,a0,a1 > > > ret > > > > > > gcc/ChangeLog: > > > > > > * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace > > > with a generalized split that requires no clobber, runs > > > before reload and works for smaller masks. > > > > > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu> > > > --- > > > > > > gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++----------------- > > > 1 file changed, 20 insertions(+), 17 deletions(-) > > > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > > index b8ab0cf169a..cc10cd90a74 100644 > > > --- a/gcc/config/riscv/riscv.md > > > +++ b/gcc/config/riscv/riscv.md > > > @@ -2119,23 +2119,26 @@ (define_split > > > ;; occur when unsigned int is used for array indexing. Split this into > > > two > > > ;; shifts. Otherwise we can get 3 shifts. > > > > > > -(define_insn_and_split "zero_extendsidi2_shifted" > > > - [(set (match_operand:DI 0 "register_operand" "=r") > > > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > > - (match_operand:QI 2 "immediate_operand" "I")) > > > - (match_operand 3 "immediate_operand" ""))) > > > - (clobber (match_scratch:DI 4 "=&r"))] > > > - "TARGET_64BIT && !TARGET_ZBA > > > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)" > > > - "#" > > > - "&& reload_completed" > > > - [(set (match_dup 4) > > > - (ashift:DI (match_dup 1) (const_int 32))) > > > - (set (match_dup 0) > > > - (lshiftrt:DI (match_dup 4) (match_dup 5)))] > > > - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));" > > > - [(set_attr "type" "shift") > > > - (set_attr "mode" "DI")]) > > > +(define_split > > > + [(set (match_operand:DI 0 "register_operand") > > > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand") > > > + (match_operand:QI 2 "immediate_operand")) > > > + (match_operand:DI 3 "consecutive_bits_operand")))] > > > + "TARGET_64BIT" > > > + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4))) > > > + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))] > > > > I would prefer to keep using another register if possible: > > > > like this: > > + [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4))) > > + (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))] > > > > if (can_create_pseudo_p) > > operands[6] = gen_reg_rtx (DImode); > > else > > operands[6] = operands[0]; > > I don't see the benefit to this (unless you expect opportunities for > CSE), as there will be a linear dependency chain anyway. I'd like to > understand your reasoning behind this a bit better, as our style > currently generally tries to not avoid introducing temporaries if it > is avoidable. > > Thanks, > Philipp. > > > > > > +{ > > > + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); > > > + int leading = clz_hwi (mask); > > > + int trailing = ctz_hwi (mask); > > > + > > > + /* The shift-amount must match the number of trailing bits */ > > > + if (trailing != UINTVAL (operands[2])) > > > + FAIL; > > > + > > > + operands[4] = GEN_INT (leading + trailing); > > > + operands[5] = GEN_INT (leading); > > > +}) > > > > > > ;; > > > ;; .................... > > > -- > > > 2.34.1 > > >