Tamar Christina <tamar.christ...@arm.com> writes: >> > The principle is that, say: >> > >> > (vec_select:V2SI (reg:V2DI R) (parallel [(const_int 0) (const_int 1)])) >> > >> > is (for little-endian) equivalent to: >> > >> > (subreg:V2SI (reg:V2DI R) 0) >> >> Sigh, of course I meant V4SI rather than V2DI in the above :) >> >> > and similarly for the equivalent big-endian pair. Simplification rules >> > are now supposed to ensure that only the second (simpler) form is generated >> > by target-independent code. We should fix any cases where that doesn't >> > happen, since it would be a missed optimisation for any instructions >> > that take (in this case) V2SI inputs. >> > >> > There's no equivalent simplification for _hi because it isn't possible >> > to refer directly to the upper 64 bits of a 128-bit register using subregs. >> > > > This removes aarch64_simd_vec_unpack<su>_lo_. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md > (aarch64_simd_vec_unpack<su>_lo_<mode>): Remove. > (vec_unpack<su>_lo_<mode): Simplify. > * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update > comment.
OK, thanks. Richard > -- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > fd10039f9a27d0da51624d6d3a6d8b2a532f5625..bbeee221f37c4875056cdf52932a787f8ac1c2aa > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1904,17 +1904,6 @@ (define_insn > "*aarch64_<srn_op>topbits_shuffle<mode>_be" > > ;; Widening operations. > > -(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>" > - [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > - (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > - (match_operand:VQW 1 "register_operand" "w") > - (match_operand:VQW 2 "vect_par_cnst_lo_half" "") > - )))] > - "TARGET_SIMD" > - "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>" > - [(set_attr "type" "neon_shift_imm_long")] > -) > - > (define_insn_and_split "aarch64_simd_vec_unpack<su>_hi_<mode>" > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > @@ -1952,14 +1941,11 @@ (define_expand "vec_unpack<su>_hi_<mode>" > ) > > (define_expand "vec_unpack<su>_lo_<mode>" > - [(match_operand:<VWIDE> 0 "register_operand") > - (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))] > + [(set (match_operand:<VWIDE> 0 "register_operand") > + (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")))] > "TARGET_SIMD" > { > - rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false); > - emit_insn (gen_aarch64_simd_vec_unpack<su>_lo_<mode> (operands[0], > - operands[1], p)); > - DONE; > + operands[1] = lowpart_subreg (<VHALF>mode, operands[1], <MODE>mode); > } > ) > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 6b106a72e49f11b8128485baceaaaddcbf139863..469eb938953a70bc6b0ce3d4aa16f773e40ee03e > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -23188,7 +23188,8 @@ aarch64_gen_shareable_zero (machine_mode mode) > to split without that restriction and instead recombine shared zeros > if they turn out not to be worthwhile. This would allow splits in > single-block functions and would also cope more naturally with > - rematerialization. */ > + rematerialization. The downside of not doing this is that we lose the > + optimizations for vector epilogues as well. */ > > bool > aarch64_split_simd_shift_p (rtx_insn *insn)