On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
<[email protected]> wrote:
>
> The current method of treating shifts of extended values on RISC-V
> frequently causes sequences of 3 shifts, despite the presence of the
> 'zero_extendsidi2_shifted' pattern.
>
> Consider:
> unsigned long f(unsigned int a, unsigned long b)
> {
> a = a << 1;
> unsigned long c = (unsigned long) a;
> c = b + (c<<4);
> return c;
> }
> which will present at combine-time as:
> Trying 7, 8 -> 9:
> 7: r78:SI=r81:DI#0<<0x1
> REG_DEAD r81:DI
> 8: r79:DI=zero_extend(r78:SI)
> REG_DEAD r78:SI
> 9: r72:DI=r79:DI<<0x4
> REG_DEAD r79:DI
> Failed to match this instruction:
> (set (reg:DI 72 [ _1 ])
> (and:DI (ashift:DI (reg:DI 81)
> (const_int 5 [0x5]))
> (const_int 68719476704 [0xfffffffe0])))
> and produce the following (optimized) assembly:
> f:
> slliw a5,a0,1
> slli a5,a5,32
> srli a5,a5,28
> add a0,a5,a1
> ret
>
> The current way of handling this (in 'zero_extendsidi2_shifted')
> doesn't apply for two reasons:
> - this is seen before reload, and
> - (more importantly) the constant mask is not 0xfffffffful.
>
> To address this, we introduce a generalized version of shifting
> zero-extended values that supports any mask of consecutive ones as
> long as the number of training zeros is the inner shift-amount.
>
> With this new split, we generate the following assembly for the
> aforementioned function:
> f:
> slli a0,a0,33
> srli a0,a0,28
> add a0,a0,a1
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
> with a generalized split that requires no clobber, runs
> before reload and works for smaller masks.
>
> Signed-off-by: Philipp Tomsich <[email protected]>
> ---
>
> gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
> 1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index b8ab0cf169a..cc10cd90a74 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2119,23 +2119,26 @@ (define_split
> ;; occur when unsigned int is used for array indexing. Split this into two
> ;; shifts. Otherwise we can get 3 shifts.
>
> -(define_insn_and_split "zero_extendsidi2_shifted"
> - [(set (match_operand:DI 0 "register_operand" "=r")
> - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> - (match_operand:QI 2 "immediate_operand" "I"))
> - (match_operand 3 "immediate_operand" "")))
> - (clobber (match_scratch:DI 4 "=&r"))]
> - "TARGET_64BIT && !TARGET_ZBA
> - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
> - "#"
> - "&& reload_completed"
> - [(set (match_dup 4)
> - (ashift:DI (match_dup 1) (const_int 32)))
> - (set (match_dup 0)
> - (lshiftrt:DI (match_dup 4) (match_dup 5)))]
> - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
> - [(set_attr "type" "shift")
> - (set_attr "mode" "DI")])
> +(define_split
> + [(set (match_operand:DI 0 "register_operand")
> + (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> + (match_operand:QI 2 "immediate_operand"))
> + (match_operand:DI 3 "consecutive_bits_operand")))]
> + "TARGET_64BIT"
> + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
> + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
I would prefer to keep using another register if possible:
like this:
+ [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4)))
+ (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))]
if (can_create_pseudo_p)
operands[6] = gen_reg_rtx (DImode);
else
operands[6] = operands[0];
> +{
> + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> + int leading = clz_hwi (mask);
> + int trailing = ctz_hwi (mask);
> +
> + /* The shift-amount must match the number of trailing bits */
> + if (trailing != UINTVAL (operands[2]))
> + FAIL;
> +
> + operands[4] = GEN_INT (leading + trailing);
> + operands[5] = GEN_INT (leading);
> +})
>
> ;;
> ;; ....................
> --
> 2.34.1
>