Tamar Christina <tamar.christ...@arm.com> writes:
>> > The principle is that, say:
>> >
>> >   (vec_select:V2SI (reg:V2DI R) (parallel [(const_int 0) (const_int 1)]))
>> >
>> > is (for little-endian) equivalent to:
>> >
>> >   (subreg:V2SI (reg:V2DI R) 0)
>> 
>> Sigh, of course I meant V4SI rather than V2DI in the above :)
>> 
>> > and similarly for the equivalent big-endian pair.  Simplification rules
>> > are now supposed to ensure that only the second (simpler) form is generated
>> > by target-independent code.  We should fix any cases where that doesn't
>> > happen, since it would be a missed optimisation for any instructions
>> > that take (in this case) V2SI inputs.
>> >
>> > There's no equivalent simplification for _hi because it isn't possible
>> > to refer directly to the upper 64 bits of a 128-bit register using subregs.
>> >
>
> This removes aarch64_simd_vec_unpack<su>_lo_.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md
>       (aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
>       (vec_unpack<su>_lo_<mode): Simplify.
>       * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
>       comment.

OK, thanks.

Richard

> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> fd10039f9a27d0da51624d6d3a6d8b2a532f5625..bbeee221f37c4875056cdf52932a787f8ac1c2aa
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1904,17 +1904,6 @@ (define_insn 
> "*aarch64_<srn_op>topbits_shuffle<mode>_be"
>  
>  ;; Widening operations.
>  
> -(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>"
> -  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> -        (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> -                            (match_operand:VQW 1 "register_operand" "w")
> -                            (match_operand:VQW 2 "vect_par_cnst_lo_half" "")
> -                         )))]
> -  "TARGET_SIMD"
> -  "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>"
> -  [(set_attr "type" "neon_shift_imm_long")]
> -)
> -
>  (define_insn_and_split "aarch64_simd_vec_unpack<su>_hi_<mode>"
>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
>          (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> @@ -1952,14 +1941,11 @@ (define_expand "vec_unpack<su>_hi_<mode>"
>  )
>  
>  (define_expand "vec_unpack<su>_lo_<mode>"
> -  [(match_operand:<VWIDE> 0 "register_operand")
> -   (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))]
> +  [(set (match_operand:<VWIDE> 0 "register_operand")
> +     (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")))]
>    "TARGET_SIMD"
>    {
> -    rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
> -    emit_insn (gen_aarch64_simd_vec_unpack<su>_lo_<mode> (operands[0],
> -                                                       operands[1], p));
> -    DONE;
> +    operands[1] = lowpart_subreg (<VHALF>mode, operands[1], <MODE>mode);
>    }
>  )
>  
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 6b106a72e49f11b8128485baceaaaddcbf139863..469eb938953a70bc6b0ce3d4aa16f773e40ee03e
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23188,7 +23188,8 @@ aarch64_gen_shareable_zero (machine_mode mode)
>     to split without that restriction and instead recombine shared zeros
>     if they turn out not to be worthwhile.  This would allow splits in
>     single-block functions and would also cope more naturally with
> -   rematerialization.  */
> +   rematerialization.  The downside of not doing this is that we lose the
> +   optimizations for vector epilogues as well.  */
>  
>  bool
>  aarch64_split_simd_shift_p (rtx_insn *insn)

Reply via email to