Hi Pan,

> The pattern of this patch only works on DImode, aka below pattern.
> v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
>   + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
>
> Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
> this extend op after expand.
>
> For uint16_t => uint32_t we have:
> (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
>
> For uint32_t => uint64_t we have:
> (set (reg:DI 148 [ _6 ])
>      (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
>
> We can see there is no zero_extend for uint16_t to uint32_t, and we
> cannot hit the pattern above.  So the combine will try below pattern
> for uint16_t to uint32_t.
>
> v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
>   + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
>
> But it cannot match the vwaddu sematics, thus we need another handing
> for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
> uint16_t.

Where does the actual HI->SI extension happen then?  No chance we see it
during combine/late-combine?

> diff --git a/gcc/config/riscv/autovec-opt.md 
> b/gcc/config/riscv/autovec-opt.md
> index 02f19bc6a42..fefd2dc63c3 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_<mode>"
>    }
>    [(set_attr "type" "vimuladd")])
>  
> +(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>"

first :)

> + [(set (match_operand:VWEXTI_D   0 "register_operand")
> +       (vec_duplicate:VWEXTI_D
> +      (any_extend:<VEL>
> +      (match_operand:<VSUBEL> 1 "register_operand"))))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +    machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode;
> +    rtx vec_dup = gen_reg_rtx (d_trunc_mode);
> +    insn_code icode = code_for_pred_broadcast (d_trunc_mode);
> +    rtx vec_dup_ops[] = {vec_dup, operands[1]};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, 
> vec_dup_ops);
> +
> +    icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode);
> +    rtx extend_ops[] = {operands[0], vec_dup};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, 
> extend_ops);

Technically, shouldn't it be the other way around?  Like first extend and then 
broadcast?

-- 
Regards
 Robin

Reply via email to