Hi!

On Thu, May 08, 2025 at 08:07:04PM -0400, Michael Meissner wrote:
> Previously GCC would zero externd a DImode GPR value to TImode by first zero
> extending the DImode value into a GPR TImode value, and then do a MTVSRDD to
> move this value to a VSX register.
> 
> This patch does the move directly, since if the middle argument to MTVSRDD is 
> 0,
> it does the zero extend.

In the mtvsrdd insn, if the second input op is 0, it means constant 0,
instead of reg 0.  Just like in many other insns.

> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=r,wa,&wa")
> +     (zero_extend:TI
> +      (match_operand:DI 1 "gpc_reg_operand" "rwa,r,wa")))]

(I was talking with Surya about this patch earlier today, and she asked
where to find the "rwa" constraint definition.  So I explained it is
really two constraints, not one; but that makes me think why not write
it the other way around?  "war"?  Too hard to resist eh!)

> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
> +  "@
> +  #
> +  mtvsrdd %x0,0,%1
> +  #"
> +  "&& reload_completed

Nope.  Please do not use reload_completed if you do not have to, it
results in worse generated code, and nastier GCC source code.

Here, without reload_completed, you just need a separate define_insn to
recognise the special case of mtvsrdd with a zero second input.  Either
as part of the vsx_concat_di pattern (maybe not even a
separate define_insn!), or as an actually separate define_insn.

That way, you have less code, simpler code, and all the later RTL
optimisations actually can do anything, instead of being disabled.

> +   && (int_reg_operand (operands[0], TImode)
> +       || vsx_register_operand (operands[1], DImode))"
> +  [(set (match_dup 2)
> +     (match_dup 3))
> +   (set (match_dup 4)
> +     (match_dup 5))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +  int r = reg_or_subregno (op0);
> +
> +  if (int_reg_operand (op0, TImode))
> +    {
> +      int lo = BYTES_BIG_ENDIAN ? 1 : 0;

You then won't need to special case between big-endian and wrong-endian
anymore either, then.  Not a huge issue in itself, but a clear
indication you were doing things not the best way.

> +      int hi = 1 - lo;
> +
> +      operands[2] = gen_rtx_REG (DImode, r + lo);
> +      operands[3] = op1;
> +      operands[4] = gen_rtx_REG (DImode, r + hi);
> +      operands[5] = const0_rtx;
> +    }
> +  else
> +    {
> +      rtx op0_di = gen_rtx_REG (DImode, r);
> +      rtx op0_v2di = gen_rtx_REG (V2DImode, r);
> +      rtx lo = WORDS_BIG_ENDIAN ? op1 : op0_di;
> +      rtx hi = WORDS_BIG_ENDIAN ? op0_di : op1;
> +
> +      operands[2] = op0_v2di;
> +      operands[3] = CONST0_RTX (V2DImode);
> +      operands[4] = op0_v2di;
> +      operands[5] = gen_rtx_VEC_CONCAT (V2DImode, hi, lo);
> +    }
> +}
> +  [(set_attr "type" "*,mtvsr,vecperm")
> +   (set_attr "length" "8,*,8")])

In your new pattern here, the last two alternatives are actually
identical, except for these two attributes.  Which will automatically be
set to the correct values if you use simpler code, btw.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108958.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */

That's the default.  You can leave this out.

So please make this work in way more cases, simpler code with better
results!


Segher

Reply via email to