Spencer Abson <spencer.ab...@arm.com> writes:
> @@ -8165,20 +8169,25 @@
>  ;;
>  ;; For unpacked vectors, it doesn't really matter whether SEL uses the
>  ;; the container size or the element size.  If SEL used the container size,
> -;; it would ignore undefined bits of the predicate but would copy the
> -;; upper (undefined) bits of each container along with the defined bits.
> -;; If SEL used the element size, it would use undefined bits of the predicate
> -;; to select between undefined elements in each input vector.  Thus the only
> -;; difference is whether the undefined bits in a container always come from
> -;; the same input as the defined bits, or whether the choice can vary
> +;; it would ignore bits of the predicate that can be undefined, but would 
> copy
> +;; the upper (undefined) bits of each container along with the defined bits.
> +;; If SEL used the element size, it would use bits from the predicate that 
> can
> +;; be undefined to select between undefined elements in each input vector.
> +;; Thus the only difference is whether the undefined bits in a container 
> always
> +;; come from the same input as the defined bits, or whether the choice can 
> vary

It looks like the main change here is to replace "undefined bits of the
predicate" with "bits of the predicate that can be undefined".  Could
you go into more detail about the distinction?  It seems to be saying
that the upper bits in each predicate are sometimes defined and
sometimes not.

As I see it, the "level of undefinedness" is really the same for the
predicates and data vectors.  Within each container element, the bits
that hold the element value are defined/significant and the other bits
are undefined/insignificant/have arbitrary values.  The same thing
goes for the upper bits in each predicate element: the low bit is
defined/significant and the other bits are undefined/insignificant/have
arbitrary values.  They might by chance be zeros when zeros are
convenient or ones when ones are convenient, but the semantics don't
guarantee anything either way.

>  ;; independently of the defined bits.
>  ;;
>  ;; For the other instructions, using the element size is more natural,
>  ;; so we do that for SEL as well.
> +;;
> +;; The use of 'aarch64_predicate_operand' here is only to support the FP 
> arithmetic/
> +;; UNSPEC_SEL combiner patterns.  As with those operations, any predicate 
> bits that
> +;; are insignificant to the data mode will have no effect on the operation's 
> result.

Sorry for the formatting nit, but: long lines.  The comment itself looks good
though.

> +;;
>  (define_insn "*vcond_mask_<mode><vpred>"
>    [(set (match_operand:SVE_ALL 0 "register_operand")
>       (unspec:SVE_ALL
> -       [(match_operand:<VPRED> 3 "register_operand")
> +       [(match_operand:<VPRED> 3 "aarch64_predicate_operand")
>          (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm")
>          (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")]
>         UNSPEC_SEL))]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 287de0f5ae4..d38b108c5f4 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3893,6 +3893,33 @@ aarch64_sve_fp_pred (machine_mode data_mode, rtx 
> *strictness)
>     return aarch64_ptrue_reg (aarch64_sve_pred_mode (data_mode));
>  }
>  
> +/* If DATA_MODE is a partial vector mode, emit a sequence of insns to
> +   zero-out the predicate bits of an existing natural GP, PRED, associated
> +   with the undefined elements in each container

This makes it sound unconditional, whereas it's really conditional on
flag_trapping_math.

Also, I'd see this more as converting a container predicate to an
element predicate.  So maybe:

/* PRED is a predicate that governs an operation on DATA_MODE.  If DATA_MODE
   is a partial vector mode, and if exceptions must be suppressed for its
   undefined elements, convert PRED from a container-level predicate to
   an element-level predicate and ensure that the undefined elements
   are inactive.

But again, please suggest something else if you don't think that's
very clear.

> +
> +   Return NULL_RTX if no insns were emitted.  That is, if DATA_MODE is not
> +   a partial vector mode, or if we don't need to prevent the operation from
> +   interpreting undefined elements.  Otherwise, return the new predicate.  */
> +rtx
> +aarch64_sve_emit_masked_fp_pred (machine_mode data_mode, rtx pred)
> +{
> +  unsigned int vec_flags = aarch64_classify_vector_mode (data_mode);
> +  if (flag_trapping_math && (vec_flags & VEC_PARTIAL))
> +    {
> +      /* Control the data as if the vector was fully packed.  */
> +      rtx mask = aarch64_sve_packed_pred (data_mode);
> +      machine_mode pmode = GET_MODE (mask);
> +
> +      /* Mask the existing predicate.  */
> +      rtx dst = gen_reg_rtx (pmode);
> +      emit_insn (gen_and3 (pmode, dst, mask,
> +                        gen_lowpart (pmode, pred)));
> +      return dst;
> +    }
> +
> +  return NULL_RTX;
> +}

How about instead returning PRED in the fallthrough case, rather than
null?  It seems like that would be more convenient for the callers
in this patch and in patch 14, and would also be more consistent
with aarch64_sve_fp_pred.

As with the earlier patches, it would be good to have a token test
that SELs aren't dropped when they shouldn't be.

BTW, since the series started with a lot of patches that were OK as-is,
or with minor changes, would you be ok with committing those and only
reposting the part of the series that might need a v2?

Thanks,
Richard

Reply via email to