Jonathan Wright <jonathan.wri...@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 74890989cb3045798bf8d0241467eaaf72238297..c4558da7c0f9228ccc5fa2b9c87e8b4690860b47
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2092,14 +2092,14 @@
>  
>  (define_insn "aarch64_<su>mlal_hi_n<mode>_insn"
>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> -        (plus:<VWIDE>
> -          (mult:<VWIDE>
> -              (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> -                 (match_operand:VQ_HSI 2 "register_operand" "w")
> -                 (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> -              (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND>
> -                    (match_operand:<VEL> 4 "register_operand" "<h_con>"))))
> -          (match_operand:<VWIDE> 1 "register_operand" "0")))]
> +     (plus:<VWIDE>
> +       (mult:<VWIDE>
> +           (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> +              (match_operand:VQ_HSI 2 "register_operand" "w")
> +              (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> +              (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S>
> +              (match_operand:<VEL> 4 "register_operand" "<h_con>"))))
> +       (match_operand:<VWIDE> 1 "register_operand" "0")))]
>    "TARGET_SIMD"
>    "<su>mlal2\t%0.<Vwtype>, %2.<Vtype>, %4.<Vetype>[0]"
>    [(set_attr "type" "neon_mla_<Vetype>_long")]
> @@ -2167,14 +2167,14 @@
>  
>  (define_insn "aarch64_<su>mlsl_hi_n<mode>_insn"
>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> -        (minus:<VWIDE>
> -          (match_operand:<VWIDE> 1 "register_operand" "0")
> -          (mult:<VWIDE>
> -            (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> -              (match_operand:VQ_HSI 2 "register_operand" "w")
> -              (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> -            (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND>
> -                 (match_operand:<VEL> 4 "register_operand" "<h_con>"))))))]
> +     (minus:<VWIDE>
> +       (match_operand:<VWIDE> 1 "register_operand" "0")
> +       (mult:<VWIDE>
> +         (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> +           (match_operand:VQ_HSI 2 "register_operand" "w")
> +           (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> +         (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S>
> +           (match_operand:<VEL> 4 "register_operand" "<h_con>"))))))]
>    "TARGET_SIMD"
>    "<su>mlsl2\t%0.<Vwtype>, %2.<Vtype>, %4.<Vetype>[0]"
>    [(set_attr "type" "neon_mla_<Vetype>_long")]

Would be good for these patterns to use the one-operator-per-line style too.

> @@ -2560,7 +2568,7 @@
>           (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
>             (match_operand:VQ_HSI 2 "register_operand" "w")
>             (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> -         (ANY_EXTEND:<VWIDE> (vec_duplicate:<VHALF>
> +         (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S>
>             (vec_select:<VEL>
>               (match_operand:<VCOND> 4 "register_operand" "<vwx>")
>               (parallel [(match_operand:SI 5 "immediate_operand" "i")]))))

Same here.

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 
> 2d169d3f9f70c85d396adaed124b6c52aca98f07..fc3701698a616727b6ef22f648b283e7d3821b24
>  100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -1731,8 +1731,12 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>  
>        The first might be done entirely in vector registers while the
>        second might need a move between register files.  */
> -      temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
> -                                    elt, GET_MODE_INNER (GET_MODE (op)));
> +      if (code == SIGN_EXTEND || code == ZERO_EXTEND)
> +     temp = simplify_gen_unary (code, GET_MODE_INNER (mode),
> +                                elt, GET_MODE_INNER (GET_MODE (op)));
> +      else
> +     temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
> +                                      elt, GET_MODE_INNER (GET_MODE (op)));
>        if (temp)
>       return gen_vec_duplicate (mode, temp);
>      }

I guess the comment:

      /* Try applying the operator to ELT and see if that simplifies.
         We can duplicate the result if so.

         The reason we don't use simplify_gen_unary is that it isn't
         necessarily a win to convert things like:

           (neg:V (vec_duplicate:V (reg:S R)))

         to:

           (vec_duplicate:V (neg:S (reg:S R)))

         The first might be done entirely in vector registers while the
         second might need a move between register files.  */

is a bit out-of-date.  How about adding this comment above the call to
simplify_gen_unary.

        /* Enforce a canonical order of VEC_DUPLICATE wrt other unary
           operations by promoting VEC_DUPLICATE to the root of the expression
           (as far as possible).  */

and using this modified form of the original comment above the call to
simplify_unary_operation:

        /* Try applying the operator to ELT and see if that simplifies.
           We can duplicate the result if so.

           The reason we traditionally haven't used simplify_gen_unary
           for these codes is that it didn't necessarily seem to be a
           win to convert things like:

             (neg:V (vec_duplicate:V (reg:S R)))

           to:

             (vec_duplicate:V (neg:S (reg:S R)))

           The first might be done entirely in vector registers while the
           second might need a move between register files.

           However, there also cases where promoting the vec_duplicate is
           more efficient, and there is definite value in having a canonical
           form when matching instruction patterns.  We should consider
           extending the simplify_gen_unary code above to more cases.  */

OK with those changes, thanks.

Richard

Reply via email to