Hi! On Thu, May 08, 2025 at 08:07:04PM -0400, Michael Meissner wrote: > Previously GCC would zero externd a DImode GPR value to TImode by first zero > extending the DImode value into a GPR TImode value, and then do a MTVSRDD to > move this value to a VSX register. > > This patch does the move directly, since if the middle argument to MTVSRDD is > 0, > it does the zero extend.
In the mtvsrdd insn, if the second input op is 0, it means constant 0, instead of reg 0. Just like in many other insns. > +(define_insn_and_split "zero_extendditi2" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=r,wa,&wa") > + (zero_extend:TI > + (match_operand:DI 1 "gpc_reg_operand" "rwa,r,wa")))] (I was talking with Surya about this patch earlier today, and she asked where to find the "rwa" constraint definition. So I explained it is really two constraints, not one; but that makes me think why not write it the other way around? "war"? Too hard to resist eh!) > + "TARGET_P9_VECTOR && TARGET_POWERPC64" > + "@ > + # > + mtvsrdd %x0,0,%1 > + #" > + "&& reload_completed Nope. Please do not use reload_completed if you do not have to, it results in worse generated code, and nastier GCC source code. Here, without reload_completed, you just need a separate define_insn to recognise the special case of mtvsrdd with a zero second input. Either as part of the vsx_concat_di pattern (maybe not even a separate define_insn!), or as an actually separate define_insn. That way, you have less code, simpler code, and all the later RTL optimisations actually can do anything, instead of being disabled. > + && (int_reg_operand (operands[0], TImode) > + || vsx_register_operand (operands[1], DImode))" > + [(set (match_dup 2) > + (match_dup 3)) > + (set (match_dup 4) > + (match_dup 5))] > +{ > + rtx op0 = operands[0]; > + rtx op1 = operands[1]; > + int r = reg_or_subregno (op0); > + > + if (int_reg_operand (op0, TImode)) > + { > + int lo = BYTES_BIG_ENDIAN ? 1 : 0; You then won't need to special case between big-endian and wrong-endian anymore either, then. Not a huge issue in itself, but a clear indication you were doing things not the best way. > + int hi = 1 - lo; > + > + operands[2] = gen_rtx_REG (DImode, r + lo); > + operands[3] = op1; > + operands[4] = gen_rtx_REG (DImode, r + hi); > + operands[5] = const0_rtx; > + } > + else > + { > + rtx op0_di = gen_rtx_REG (DImode, r); > + rtx op0_v2di = gen_rtx_REG (V2DImode, r); > + rtx lo = WORDS_BIG_ENDIAN ? op1 : op0_di; > + rtx hi = WORDS_BIG_ENDIAN ? op0_di : op1; > + > + operands[2] = op0_v2di; > + operands[3] = CONST0_RTX (V2DImode); > + operands[4] = op0_v2di; > + operands[5] = gen_rtx_VEC_CONCAT (V2DImode, hi, lo); > + } > +} > + [(set_attr "type" "*,mtvsr,vecperm") > + (set_attr "length" "8,*,8")]) In your new pattern here, the last two alternatives are actually identical, except for these two attributes. Which will automatically be set to the correct values if you use simpler code, btw. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr108958.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ That's the default. You can leave this out. So please make this work in way more cases, simpler code with better results! Segher