Richard Sandiford <richard.sandif...@arm.com> writes:
> Tamar Christina <tamar.christ...@arm.com> writes:
>> In the example
>>
>> void f1 ()
>> {
>>   for (int i = 0; i < N; i++)
>>     {
>>       b[i] += a[i];
>>       if (a[i] > 0)
>>      break;
>>     }
>> }
>>
>> when compiled for SVE we generate:
>>
>>         ld1w    z28.s, p7/z, [x4, x0, lsl 2]
>>         cmpgt   p14.s, p7/z, z28.s, #0
>>         ptest   p15, p14.b
>>         b.none  .L3
>>
>> Where the ptest isn't needed since the branch only cares about the Z and NZ
>> flags.
>>
>> GCC Today supports eliding this through the pattern *cmp<cmp_op><mode>_ptest
>> however this pattern only supports the removal when the outermost context is 
>> a
>> CMP where the predicate is inside the condition itself.
>>
>> This typically only happens for an unpredicated CMP as a ptrue will be 
>> generated
>> during expand.
>>
>> In the case about at the GIMPLE level we have
>>
>>   mask_patt_14.15_57 = vect__2.11_52 > { 0, ... };
>>   vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57;
>>   if (vec_mask_and_58 != { 0, ... })
>>     goto <bb 5>; [5.50%]
>>   else
>>     goto <bb 6>; [94.50%]
>>
>> where the loop mask is applied to the compare as an AND.
>>
>> The loop mask is moved into the compare by the pattern *cmp<cmp_op><mode>_and
>> which moves the mask inside if the current mask is a ptrue since
>> p && true -> p.
>>
>> However this happens after combine, and so we can't both move the predicate
>> inside AND eliminate the ptests.
>>
>> To fix this this patch adds a new pattern *cmp<cmp_op><mode>_and_ptest which
>> combines these two patterns together allowing us to both push the predicate
>> inside and eliminate the ptest.
>>
>> After this patch we generate
>>
>>         ld1w    z28.s, p7/z, [x4, x0, lsl 2]
>>         cmpgt   p14.s, p7/z, z28.s, #0
>>         b.none  .L3
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>      PR target/118974
>>      * config/aarch64/aarch64-sve.md (*cmp<cmp_op><mode>_and_ptest): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      PR target/118974
>>      * gcc.target/aarch64/sve/pr119351.c: Update codegen.
>>      * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise.
>>
>> ---
>>
>> diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> b/gcc/config/aarch64/aarch64-sve.md
>> index 
>> bf7569f932b6d7392b9c4fb7b94efafb6fd184c2..fe7f52ee1ed400b4eda28e3f90edc0044a5aa7a9
>>  100644
>> --- a/gcc/config/aarch64/aarch64-sve.md
>> +++ b/gcc/config/aarch64/aarch64-sve.md
>> @@ -8319,6 +8319,40 @@ (define_insn_and_rewrite "*cmp<cmp_op><mode>_ptest"
>>    }
>>  )
>>  
>> +;; Predicated integer comparisons, formed by combining a PTRUE-predicated
>> +;; comparison with an AND in which only the flags result is interesting.
>> +(define_insn_and_rewrite "*cmp<cmp_op><mode>_and_ptest"
>> +  [(set (reg:CC_NZC CC_REGNUM)
>> +    (unspec:CC_NZC
>> +      [(match_operand:VNx16BI 1 "register_operand")
>> +       (match_operand 4)
>> +       (const_int SVE_KNOWN_PTRUE)
>> +       (and:<VPRED>
>> +         (unspec:<VPRED>
>> +           [(match_operand 5)
>> +            (const_int SVE_KNOWN_PTRUE)
>> +            (SVE_INT_CMP:<VPRED>
>> +              (match_operand:SVE_I 2 "register_operand")
>> +              (match_operand:SVE_I 3 
>> "aarch64_sve_cmp_<sve_imm_con>_operand"))]
>> +           UNSPEC_PRED_Z)
>> +         (match_operand:<VPRED> 6 "register_operand"))]
>> +      UNSPEC_PTEST))
>> +   (clobber (match_scratch:<VPRED> 0))]
>> +  "TARGET_SVE"
>> +  {@ [ cons: =0, 1    , 2 , 3            ; attrs: pred_clobber ]
>> +     [ &Upa    ,  Upl, w , <sve_imm_con>; yes                 ] 
>> cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, #%3
>> +     [ ?Upl    ,  0  , w , <sve_imm_con>; yes                 ] ^
>> +     [ Upa     ,  Upl, w , <sve_imm_con>; no                  ] ^
>> +     [ &Upa    ,  Upl, w , w            ; yes                 ] 
>> cmp<cmp_op>\t%0.<Vetype>, %6/z, %2.<Vetype>, %3.<Vetype>
>> +     [ ?Upl    ,  0  , w , w            ; yes                 ] ^
>> +     [ Upa     ,  Upl, w , w            ; no                  ] ^
>> +  }
>> +  "&& !rtx_equal_p (operands[4], operands[5])"
>> +  {
>> +    operands[5] = copy_rtx (operands[4]);
>> +  }
>> +)
>> +
>
> This doesn't look like a legitimate fold, since the LAST flag from the
> original PTEST will apply to the last bit of the predicate, which for a
> .H, .S, or .D comparison will be zero.  The fused compare will instead
> set the LAST flag based on the last full SVE_I element.
>
> For example, for a .D comparison on 128-bit SVE, we'd get (lsb first):
>
>    a0000000 b0000000
>
> where a and b are the comparison results.  The ptest is testing this
> against a governing predicate of:
>
>    11111111 11111111
>
> so LAST should always be zero.  The fold above would instead set LAST to b.

For the record, Tamar pointed out off-list that this example is bogus.
The SVE_KNOWN_PTRUE in the UNSPEC_PTEST guarantees that operand 4 is
a strict VPRED predicate (upper bits zero).  So the real issue is
instead the intervening AND, which could remove initial and trailing
elements.  The matched compare would ignore those ANDed off elements
(because the compare would treat them as inactive), whereas the
original PTEST would test the full VPRED predicate.

So it is still invalid, just not for the reason that I said.

Sorry for the confusion!

Richard

>
> I think we need to differentiate users that care about LAST from
> those that don't, such as by using CC_Z for NE/EQ ANY/NONE comparisons.
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151 for more details.
>
> Thanks,
> Richard
>
>>  ;; Predicated integer comparisons, formed by combining a PTRUE-predicated
>>  ;; comparison with an AND.  Split the instruction into its preferred form
>>  ;; at the earliest opportunity, in order to get rid of the redundant
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
>> index 
>> 85aab355f95f83e1fa65d280f14fb8ade7f7e658..1ebc735a82f4a59d8eccff39346e46a449b4729a
>>  100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c
>> @@ -14,7 +14,6 @@ int x[N] __attribute__((aligned(32)));
>>  **  ...
>>  **  ld1w    z[0-9]+.s, p[0-9]+/z, \[x[0-9], x[0-9], lsl 2\]
>>  **  cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  ...
>  >  */
>>  
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
>> index 
>> d7cef1105410be04ed67d1d3b800746267f205a8..8bd6fafc4d4248cf0acf7dfa2f07cd005f13de35
>>  100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch.c
>> @@ -8,7 +8,6 @@ int b[N] = {0};
>>  ** f1:
>>  **  ...
>>  **  cmpgt   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    \.L[0-9]+
>>  **  ...
>>  */
>> @@ -25,7 +24,6 @@ void f1 ()
>>  ** f2:
>>  **  ...
>>  **  cmpge   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    \.L[0-9]+
>>  **  ...
>>  */
>> @@ -42,7 +40,6 @@ void f2 ()
>>  ** f3:
>>  **  ...
>>  **  cmpeq   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    \.L[0-9]+
>>  **  ...
>>  */
>> @@ -59,7 +56,6 @@ void f3 ()
>>  ** f4:
>>  **  ...
>>  **  cmpne   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    \.L[0-9]+
>>  **  ...
>>  */
>> @@ -76,7 +72,6 @@ void f4 ()
>>  ** f5:
>>  **  ...
>>  **  cmplt   p[0-9]+.s, p7/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    .L[0-9]+
>>  **  ...
>>  */
>> @@ -93,7 +88,6 @@ void f5 ()
>>  ** f6:
>>  **  ...
>>  **  cmple   p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
>> -**  ptest   p[0-9]+, p[0-9]+.b
>>  **  b.(any|none)    \.L[0-9]+
>>  **  ...
>>  */

Reply via email to