Hi Pan,
> The pattern of this patch only works on DImode, aka below pattern.
> v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
> + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
>
> Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
> this extend op after expand.
>
> For uint16_t => uint32_t we have:
> (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
>
> For uint32_t => uint64_t we have:
> (set (reg:DI 148 [ _6 ])
> (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
>
> We can see there is no zero_extend for uint16_t to uint32_t, and we
> cannot hit the pattern above. So the combine will try below pattern
> for uint16_t to uint32_t.
>
> v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
> + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
>
> But it cannot match the vwaddu sematics, thus we need another handing
> for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
> uint16_t.
Where does the actual HI->SI extension happen then? No chance we see it
during combine/late-combine?
> diff --git a/gcc/config/riscv/autovec-opt.md
> b/gcc/config/riscv/autovec-opt.md
> index 02f19bc6a42..fefd2dc63c3 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_<mode>"
> }
> [(set_attr "type" "vimuladd")])
>
> +(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>"
first :)
> + [(set (match_operand:VWEXTI_D 0 "register_operand")
> + (vec_duplicate:VWEXTI_D
> + (any_extend:<VEL>
> + (match_operand:<VSUBEL> 1 "register_operand"))))]
> + "TARGET_VECTOR && can_create_pseudo_p ()"
> + "#"
> + "&& 1"
> + [(const_int 0)]
> + {
> + machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode;
> + rtx vec_dup = gen_reg_rtx (d_trunc_mode);
> + insn_code icode = code_for_pred_broadcast (d_trunc_mode);
> + rtx vec_dup_ops[] = {vec_dup, operands[1]};
> + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP,
> vec_dup_ops);
> +
> + icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode);
> + rtx extend_ops[] = {operands[0], vec_dup};
> + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP,
> extend_ops);
Technically, shouldn't it be the other way around? Like first extend and then
broadcast?
--
Regards
Robin