Jennifer Schmitz <[email protected]> writes:
>> On 30 Sep 2025, at 11:54, Richard Sandiford
>> <[email protected]> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Kyrylo Tkachov <[email protected]> writes:
>>>> diff --git a/gcc/config/aarch64/aarch64-sve2.md
>>>> b/gcc/config/aarch64/aarch64-sve2.md
>>>> index a3cbbce8b31..d82dc41df3e 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve2.md
>>>> +++ b/gcc/config/aarch64/aarch64-sve2.md
>>>> @@ -1124,6 +1124,42 @@
>>>>  [(set_attr "movprfx" "yes")]
>>>> )
>>>> 
>>>> +;; Predicated binary operations with no reverse form, merging with
>>>> +;; the second input.
>>>> +(define_insn_and_rewrite "*cond_<sve_int_op><mode>_3"
>>>> +  [(set (match_operand:SVE_FULL_I 0 "register_operand")
>>>> + (unspec:SVE_FULL_I
>>>> +   [(match_operand:<VPRED> 1 "register_operand")
>>>> +    (unspec:SVE_FULL_I
>>>> +      [(match_operand 5)
>>>> +       (unspec:SVE_FULL_I
>>>> + [(match_operand:SVE_FULL_I 2 "register_operand")
>>>> +  (match_operand:SVE_FULL_I 3 "register_operand")]
>>>> + SVE2_COND_INT_BINARY_NOREV)]
>>>> +      UNSPEC_PRED_X)
>>>> +    (match_operand:SVE_FULL_I 4 "register_operand")]
>>>> +   UNSPEC_SEL))]
>>>> +  "TARGET_SVE2
>>> 
>>> 
>>> The define_insn-related constructs should really have constraints for
>>> the operands.
>>> I know they will be likely be split before reload when it matters, but
>>> it’s not impossible that they can reach RA, and so the pattern should
>>> have information for which constraints to use.
>> 
>> Also, this is a define_insn_and_rewrite, so any "split" is from this
>> define_insn to itself.  Once the define_insn is matched, it will be used
>> by the RA in some form.
>> 
>> (For avoidance of doubt, operand 5 can stay as-is.  Only the
>> register_operands need constraints.)
>> 
> Hi Kyrill and Richard,
> Thanks for the review. Below is the updated patch with constraints for 
> operands 0-4. I repeated bootstrap and testing, no regression. I will push it 
> to trunk tomorrow if there are no objections.
> As pointed out by Alex Coplan in the bugzilla ticket, this ICE goes back to 
> GCC 10. Shall we backport? If so, to which release(s)?
> Thanks,
> Jennifer
>
>
> When op2 in SVE2 saturating add intrinsics (svuqadd, svsqadd) is a zero
> vector and predication is _z, an ICE in vregs occurs, e.g. for
>
> svuint8_t foo (svbool_t pg, svuint8_t op1)
> {
>     return svsqadd_u8_z (pg, op1, svdup_s8 (0));
> }
>
> The insn failed to match the pattern (aarch64-sve2.md):
>
> ;; Predicated binary operations with no reverse form, merging with zero.
> ;; At present we don't generate these patterns via a cond_* optab,
> ;; so there's no correctness requirement to handle merging with an
> ;; independent value.
> (define_insn_and_rewrite "*cond_<sve_int_op><mode>_z"
>   [(set (match_operand:SVE_FULL_I 0 "register_operand")
>       (unspec:SVE_FULL_I
>         [(match_operand:<VPRED> 1 "register_operand")
>          (unspec:SVE_FULL_I
>            [(match_operand 5)
>             (unspec:SVE_FULL_I
>               [(match_operand:SVE_FULL_I 2 "register_operand")
>                (match_operand:SVE_FULL_I 3 "register_operand")]
>               SVE2_COND_INT_BINARY_NOREV)]
>            UNSPEC_PRED_X)
>          (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")]
>         UNSPEC_SEL))]
>   "TARGET_SVE2"
>   {@ [ cons: =0 , 1   , 2 , 3  ]
>      [ &w       , Upl , 0 , w  ] movprfx\t%0.<Vetype>, %1/z, 
> %0.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
>      [ &w       , Upl , w , w  ] movprfx\t%0.<Vetype>, %1/z, 
> %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
>   }
>   "&& !CONSTANT_P (operands[5])"
>   {
>     operands[5] = CONSTM1_RTX (<VPRED>mode);
>   }
>   [(set_attr "movprfx" "yes")]
> )
>
> because operands[3] and operands[4] were both expanded into the same register
> operand containing a zero vector by define_expand "@cond_<sve_int_op><mode>".
>
> This patch fixes the ICE by adding the pattern
>   ;; Predicated binary operations with no reverse form, merging with
>   ;; the second input.
>   (define_insn_and_rewrite "*cond_<sve_int_op><mode>_3"
> for SVE2_COND_INT_BINARY_NOREV that allows operands[4] to be a register 
> operand
> (equal to operands[3]). It adds a vcond_mask expression such that we end
> up with an insn matched by
>   ;; Predicated binary arithmetic, merging with the first input.
>   (define_insn_and_rewrite "*cond_<sve_int_op><mode>_2"
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for trunk?
>
> Alex Coplan pointed out in the bugzilla ticket that this ICE goes back
> to GCC 10. Shall we backport?
>
> Signed-off-by: Jennifer Schmitz <[email protected]>
>
> gcc/
>       PR target/121599
>       * config/aarch64/aarch64-sve2.md (*cond_<sve_int_op><mode>_3):
>       New pattern for SVE2_COND_INT_BINARY_NOREV.
>
> gcc/testsuite/
>       PR target/121599
>       * gcc.target/aarch64/sve2/pr121599.c: New test.
> ---
>  gcc/config/aarch64/aarch64-sve2.md            | 36 +++++++++++++++++++
>  .../gcc.target/aarch64/sve2/pr121599.c        | 31 ++++++++++++++++
>  2 files changed, 67 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 69a376706fa..2ed9bee335b 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -1164,6 +1164,42 @@
>     (set_attr "sve_type" "sve_int_general")]
>  )
>  
> +;; Predicated binary operations with no reverse form, merging with
> +;; the second input.
> +(define_insn_and_rewrite "*cond_<sve_int_op><mode>_3"
> +  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=&w")
> +     (unspec:SVE_FULL_I
> +       [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +        (unspec:SVE_FULL_I
> +          [(match_operand 5)
> +           (unspec:SVE_FULL_I
> +             [(match_operand:SVE_FULL_I 2 "register_operand" "w")
> +              (match_operand:SVE_FULL_I 3 "register_operand" "w")]
> +             SVE2_COND_INT_BINARY_NOREV)]
> +          UNSPEC_PRED_X)
> +        (match_operand:SVE_FULL_I 4 "register_operand" "w")]
> +       UNSPEC_SEL))]
> +  "TARGET_SVE2
> +   && rtx_equal_p (operands[3], operands[4])"

Sorry for not noticing last time (I was just concentrating on the
constraints thing) but: it's better to use (match_dup 3) instead
of using rtx_equal_p in the C++ condition.  I don't think there's
a requirement for all instances of a pseudo to be replaced by the
same hard register.  Something that starts off rtx_equal_p might
not stay that way after RA.

But the more fundamental question is: why is addition of zero special?
As the comment above *cond_<sve_int_op><mode>_z says:

;; Predicated binary operations with no reverse form, merging with zero.
;; At present we don't generate these patterns via a cond_* optab,
;; so there's no correctness requirement to handle merging with an
;; independent value.
(define_insn_and_rewrite "*cond_<sve_int_op><mode>_z"

So why does the testcse end up with a register "else" value instead of a zero
else value, and why does it only happen for zero inputs and not (for example)
for an input of 1?

Debugging, it appears to be because of:

      /* First try reusing the result of an earlier legitimization.
         This avoids duplicate rtl and ensures that tied operands
         remain tied.

         This search is linear, but NOPS is bounded at compile time
         to a small number (current a single digit).  */
      unsigned int j = 0;
      for (; j < i; ++j)
        if (can_reuse_operands_p (icode, opno + j, opno + i, &ops[j], &ops[i])
            && rtx_equal_p (orig_values[j], orig_values[i])
            && ops[j].value
            && insn_operand_matches (icode, opno + i, ops[j].value))
          {
            ops[i].value = copy_rtx (ops[j].value);
            break;
          }

in optabs.cc.  Perhaps we should restrict that to cases where the
original operand doesn't match the predicate.  (Although, as the comment
says, that could have negative effects for tied operands and so wouldn't
be risk-free.)

A backportable fix would be:

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 909d9e524e1..c18865d9348 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -4351,7 +4351,10 @@ function_expander::use_cond_insn (insn_code icode, 
unsigned int merge_argno)
   add_input_operand (icode, pred);
   for (unsigned int i = 0; i < nops; ++i)
     add_input_operand (icode, args[opno + i]);
-  add_input_operand (icode, fallback_arg);
+  if (fallback_arg == CONST0_RTX (mode))
+    add_fixed_operand (fallback_arg);
+  else
+    add_input_operand (icode, fallback_arg);
   return generate_insn (icode);
 }
 
I think this is better than adding a new pattern.  It encourages the use
of MOVPRFX /Z in other cases too.

I haven't tested that.  If it turns out that zero isn't always allowed,
we could add an insn_operand_matches test to the fallback_arg test.

Thanks,
Richard

Reply via email to