Tamar Christina <tamar.christ...@arm.com> writes: > In the example > > void f1 () > { > for (int i = 0; i < N; i++) > { > b[i] += a[i]; > if (a[i] > 0) > break; > } > } > > when compiled for SVE we generate: > > ld1w z28.s, p7/z, [x4, x0, lsl 2] > cmpgt p14.s, p7/z, z28.s, #0 > ptest p15, p14.b > b.none .L3 > > Where the ptest isn't needed since the branch only cares about the Z and NZ > flags. > > GCC Today supports eliding this through the pattern *cmp<cmp_op><mode>_ptest > however this pattern only supports the removal when the outermost context is a > CMP where the predicate is inside the condition itself. > > This typically only happens for an unpredicated CMP as a ptrue will be > generated > during expand. > > In the case about at the GIMPLE level we have > > mask_patt_14.15_57 = vect__2.11_52 > { 0, ... }; > vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57; > if (vec_mask_and_58 != { 0, ... }) > goto <bb 5>; [5.50%] > else > goto <bb 6>; [94.50%] > > where the loop mask is applied to the compare as an AND. > > The loop mask is moved into the compare by the pattern *cmp<cmp_op><mode>_and > which moves the mask inside if the current mask is a ptrue since > p && true -> p. > > However this happens after combine, and so we can't both move the predicate > inside AND eliminate the ptests. > > To fix this this patch adds a new pattern *cmp<cmp_op><mode>_and_ptest which > combines these two patterns together allowing us to both push the predicate > inside and eliminate the ptest. > > After this patch we generate > > ld1w z28.s, p7/z, [x4, x0, lsl 2] > cmpgt p14.s, p7/z, z28.s, #0 > b.none .L3 > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/118974 > * config/aarch64/aarch64-sve.md (*cmp<cmp_op><mode>_and_ptest): New. > > gcc/testsuite/ChangeLog: > > PR target/118974 > * gcc.target/aarch64/sve/pr119351.c: Update codegen. > * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise. > > --- > > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index > bf7569f932b6d7392b9c4fb7b94efafb6fd184c2..fe7f52ee1ed400b4eda28e3f90edc0044a5aa7a9 > 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -8319,6 +8319,40 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest" > } > ) > > +;; Predicated integer comparisons, formed by combining a PTRUE-predicated > +;; comparison with an AND in which only the flags result is interesting. > +(define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest" > + [(set (reg:CC_NZC CC_REGNUM) > + (unspec:CC_NZC > + [(match_operand:VNx16BI 1 "register_operand") > + (match_operand 4) > + (const_int SVE_KNOWN_PTRUE) > + (and:<VPRED> > + (unspec:<VPRED> > + [(match_operand 5) > + (const_int SVE_KNOWN_PTRUE) > + (SVE_INT_CMP:<VPRED> > + (match_operand:SVE_I 2 "register_operand") > + (match_operand:SVE_I 3 > "aarch64_sve_cmp_<sve_imm_con>_operand"))] > + UNSPEC_PRED_Z) > + (match_operand:<VPRED> 6 "register_operand"))] > + UNSPEC_PTEST)) > + (clobber (match_scratch:<VPRED> 0))] > + "TARGET_SVE" > + {@ [ cons: =0, 1 , 2 , 3 ; attrs: pred_clobber ] > + [ &Upa , Upl, w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3 > + [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ > + [ Upa , Upl, w , <sve_imm_con>; no ] ^ > + [ &Upa , Upl, w , w ; yes ] > cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, %3.<Vetype> > + [ ?Upl , 0 , w , w ; yes ] ^ > + [ Upa , Upl, w , w ; no ] ^ > + } > + "&& !rtx_equal_p (operands[4], operands[5])" > + { > + operands[5] = copy_rtx (operands[4]); > + } > +) > +
This doesn't look like a legitimate fold, since the LAST flag from the original PTEST will apply to the last bit of the predicate, which for a .H, .S, or .D comparison will be zero. The fused compare will instead set the LAST flag based on the last full SVE_I element. For example, for a .D comparison on 128-bit SVE, we'd get (lsb first): a0000000 b0000000 where a and b are the comparison results. The ptest is testing this against a governing predicate of: 11111111 11111111 so LAST should always be zero. The fold above would instead set LAST to b. I think we need to differentiate users that care about LAST from those that don't, such as by using CC_Z for NE/EQ ANY/NONE comparisons. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151 for more details. Thanks, Richard > ;; Predicated integer comparisons, formed by combining a PTRUE-predicated > ;; comparison with an AND. Split the instruction into its preferred form > ;; at the earliest opportunity, in order to get rid of the redundant > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > index > 85aab355f95f83e1fa65d280f14fb8ade7f7e658..1ebc735a82f4a59d8eccff39346e46a449b4729a > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > @@ -14,7 +14,6 @@ int x[N] __attribute__((aligned(32))); > ** ... > ** ld1w z[0-9]+.s, p[0-9]+/z, \[x[0-9], x[0-9], lsl 2\] > ** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** ... > */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > index > d7cef1105410be04ed67d1d3b800746267f205a8..8bd6fafc4d4248cf0acf7dfa2f07cd005f13de35 > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c > @@ -8,7 +8,6 @@ int b[N] = {0}; > ** f1: > ** ... > ** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) \.L[0-9]+ > ** ... > */ > @@ -25,7 +24,6 @@ void f1 () > ** f2: > ** ... > ** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) \.L[0-9]+ > ** ... > */ > @@ -42,7 +40,6 @@ void f2 () > ** f3: > ** ... > ** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) \.L[0-9]+ > ** ... > */ > @@ -59,7 +56,6 @@ void f3 () > ** f4: > ** ... > ** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) \.L[0-9]+ > ** ... > */ > @@ -76,7 +72,6 @@ void f4 () > ** f5: > ** ... > ** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) .L[0-9]+ > ** ... > */ > @@ -93,7 +88,6 @@ void f5 () > ** f6: > ** ... > ** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 > -** ptest p[0-9]+, p[0-9]+.b > ** b.(any|none) \.L[0-9]+ > ** ... > */