> > The principle is that, say: > > > > (vec_select:V2SI (reg:V2DI R) (parallel [(const_int 0) (const_int 1)])) > > > > is (for little-endian) equivalent to: > > > > (subreg:V2SI (reg:V2DI R) 0) > > Sigh, of course I meant V4SI rather than V2DI in the above :) > > > and similarly for the equivalent big-endian pair. Simplification rules > > are now supposed to ensure that only the second (simpler) form is generated > > by target-independent code. We should fix any cases where that doesn't > > happen, since it would be a missed optimisation for any instructions > > that take (in this case) V2SI inputs. > > > > There's no equivalent simplification for _hi because it isn't possible > > to refer directly to the upper 64 bits of a 128-bit register using subregs. > >
This removes aarch64_simd_vec_unpack<su>_lo_.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
(vec_unpack<su>_lo_<mode): Simplify.
* config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
comment.
-- inline copy of patch --
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index
fd10039f9a27d0da51624d6d3a6d8b2a532f5625..bbeee221f37c4875056cdf52932a787f8ac1c2aa
100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1904,17 +1904,6 @@ (define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_be"
;; Widening operations.
-(define_insn "aarch64_simd_vec_unpack<su>_lo_<mode>"
- [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
- (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
- (match_operand:VQW 1 "register_operand" "w")
- (match_operand:VQW 2 "vect_par_cnst_lo_half" "")
- )))]
- "TARGET_SIMD"
- "<su>xtl\t%0.<Vwtype>, %1.<Vhalftype>"
- [(set_attr "type" "neon_shift_imm_long")]
-)
-
(define_insn_and_split "aarch64_simd_vec_unpack<su>_hi_<mode>"
[(set (match_operand:<VWIDE> 0 "register_operand" "=w")
(ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
@@ -1952,14 +1941,11 @@ (define_expand "vec_unpack<su>_hi_<mode>"
)
(define_expand "vec_unpack<su>_lo_<mode>"
- [(match_operand:<VWIDE> 0 "register_operand")
- (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))]
+ [(set (match_operand:<VWIDE> 0 "register_operand")
+ (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
- emit_insn (gen_aarch64_simd_vec_unpack<su>_lo_<mode> (operands[0],
- operands[1], p));
- DONE;
+ operands[1] = lowpart_subreg (<VHALF>mode, operands[1], <MODE>mode);
}
)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index
6b106a72e49f11b8128485baceaaaddcbf139863..469eb938953a70bc6b0ce3d4aa16f773e40ee03e
100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23188,7 +23188,8 @@ aarch64_gen_shareable_zero (machine_mode mode)
to split without that restriction and instead recombine shared zeros
if they turn out not to be worthwhile. This would allow splits in
single-block functions and would also cope more naturally with
- rematerialization. */
+ rematerialization. The downside of not doing this is that we lose the
+ optimizations for vector epilogues as well. */
bool
aarch64_split_simd_shift_p (rtx_insn *insn)
rb18605.patch
Description: rb18605.patch
