Richard Sandiford <[email protected]> writes:
> Richard Sandiford <[email protected]> writes:
>> Michael Collison <[email protected]> writes:
>>> Richard,
>>>
>>> The problem with this approach for Aarch64 is that
>>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
>>> normally 0 as it based on the TARGET_SIMD flag.
>>
>> Maybe I'm wrong, but that seems like a missed optimisation in itself.
Sorry to follow up on myself yet again, but I'd forgotten this was
because we allow the SIMD unit to do scalar shifts. So I guess
we have no choice, even though it seems unfortunate.
> +(define_insn_and_split "*aarch64_reg_<optab>_minus<mode>3"
> + [(set (match_operand:GPI 0 "register_operand" "=&r")
> + (ASHIFT:GPI
> + (match_operand:GPI 1 "register_operand" "r")
> + (minus:QI (match_operand 2 "const_int_operand" "n")
> + (match_operand:QI 3 "register_operand" "r"))))]
> + "INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)"
> + "#"
> + "&& true"
> + [(const_int 0)]
> + {
> + /* Handle cases where operand 3 is a plain QI register, or
> + a subreg with either a SImode or DImode register. */
> +
> + rtx subreg_tmp = (REG_P (operands[3])
> + ? gen_lowpart_SUBREG (SImode, operands[3])
> + : SUBREG_REG (operands[3]));
> +
> + if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode)
> + subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp);
I think this all simplifies to:
rtx subreg_tmp = gen_lowpart (SImode, operands[3]);
(or it would be worth having a comment that explains why not).
As well as being shorter, it will properly simplify hard REGs
to new hard REGs.
> + rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> + : operands[0]);
> +
> + if (<MODE>mode == DImode && !can_create_pseudo_p ())
> + tmp = gen_lowpart_SUBREG (SImode, operands[0]);
I think this too would be simpler with gen_lowpart:
rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
: gen_lowpart (SImode, operands[0]));
> +
> + emit_insn (gen_negsi2 (tmp, subreg_tmp));
> +
> + rtx and_op = gen_rtx_AND (SImode, tmp,
> + GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1));
> +
> + rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op);
> +
> + emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp2));
> + DONE;
> + }
> +)
The pattern should probably set the "length" attribute to 8.
Looks good to me with those changes FWIW.
Thanks,
Richard