Jonathan Wright <jonathan.wri...@arm.com> writes: > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 74890989cb3045798bf8d0241467eaaf72238297..c4558da7c0f9228ccc5fa2b9c87e8b4690860b47 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2092,14 +2092,14 @@ > > (define_insn "aarch64_<su>mlal_hi_n<mode>_insn" > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > - (plus:<VWIDE> > - (mult:<VWIDE> > - (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > - (match_operand:VQ_HSI 2 "register_operand" "w") > - (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > - (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND> > - (match_operand:<VEL> 4 "register_operand" "<h_con>")))) > - (match_operand:<VWIDE> 1 "register_operand" "0")))] > + (plus:<VWIDE> > + (mult:<VWIDE> > + (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > + (match_operand:VQ_HSI 2 "register_operand" "w") > + (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > + (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S> > + (match_operand:<VEL> 4 "register_operand" "<h_con>")))) > + (match_operand:<VWIDE> 1 "register_operand" "0")))] > "TARGET_SIMD" > "<su>mlal2\t%0.<Vwtype>, %2.<Vtype>, %4.<Vetype>[0]" > [(set_attr "type" "neon_mla_<Vetype>_long")] > @@ -2167,14 +2167,14 @@ > > (define_insn "aarch64_<su>mlsl_hi_n<mode>_insn" > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > - (minus:<VWIDE> > - (match_operand:<VWIDE> 1 "register_operand" "0") > - (mult:<VWIDE> > - (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > - (match_operand:VQ_HSI 2 "register_operand" "w") > - (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > - (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND> > - (match_operand:<VEL> 4 "register_operand" "<h_con>"))))))] > + (minus:<VWIDE> > + (match_operand:<VWIDE> 1 "register_operand" "0") > + (mult:<VWIDE> > + (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > + (match_operand:VQ_HSI 2 "register_operand" "w") > + (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > + (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S> > + (match_operand:<VEL> 4 "register_operand" "<h_con>"))))))] > "TARGET_SIMD" > "<su>mlsl2\t%0.<Vwtype>, %2.<Vtype>, %4.<Vetype>[0]" > [(set_attr "type" "neon_mla_<Vetype>_long")]
Would be good for these patterns to use the one-operator-per-line style too. > @@ -2560,7 +2568,7 @@ > (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > (match_operand:VQ_HSI 2 "register_operand" "w") > (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > - (ANY_EXTEND:<VWIDE> (vec_duplicate:<VHALF> > + (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S> > (vec_select:<VEL> > (match_operand:<VCOND> 4 "register_operand" "<vwx>") > (parallel [(match_operand:SI 5 "immediate_operand" "i")])))) Same here. > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index > 2d169d3f9f70c85d396adaed124b6c52aca98f07..fc3701698a616727b6ef22f648b283e7d3821b24 > 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -1731,8 +1731,12 @@ simplify_context::simplify_unary_operation_1 (rtx_code > code, machine_mode mode, > > The first might be done entirely in vector registers while the > second might need a move between register files. */ > - temp = simplify_unary_operation (code, GET_MODE_INNER (mode), > - elt, GET_MODE_INNER (GET_MODE (op))); > + if (code == SIGN_EXTEND || code == ZERO_EXTEND) > + temp = simplify_gen_unary (code, GET_MODE_INNER (mode), > + elt, GET_MODE_INNER (GET_MODE (op))); > + else > + temp = simplify_unary_operation (code, GET_MODE_INNER (mode), > + elt, GET_MODE_INNER (GET_MODE (op))); > if (temp) > return gen_vec_duplicate (mode, temp); > } I guess the comment: /* Try applying the operator to ELT and see if that simplifies. We can duplicate the result if so. The reason we don't use simplify_gen_unary is that it isn't necessarily a win to convert things like: (neg:V (vec_duplicate:V (reg:S R))) to: (vec_duplicate:V (neg:S (reg:S R))) The first might be done entirely in vector registers while the second might need a move between register files. */ is a bit out-of-date. How about adding this comment above the call to simplify_gen_unary. /* Enforce a canonical order of VEC_DUPLICATE wrt other unary operations by promoting VEC_DUPLICATE to the root of the expression (as far as possible). */ and using this modified form of the original comment above the call to simplify_unary_operation: /* Try applying the operator to ELT and see if that simplifies. We can duplicate the result if so. The reason we traditionally haven't used simplify_gen_unary for these codes is that it didn't necessarily seem to be a win to convert things like: (neg:V (vec_duplicate:V (reg:S R))) to: (vec_duplicate:V (neg:S (reg:S R))) The first might be done entirely in vector registers while the second might need a move between register files. However, there also cases where promoting the vec_duplicate is more efficient, and there is definite value in having a canonical form when matching instruction patterns. We should consider extending the simplify_gen_unary code above to more cases. */ OK with those changes, thanks. Richard