Spencer Abson <spencer.ab...@arm.com> writes: > @@ -8165,20 +8169,25 @@ > ;; > ;; For unpacked vectors, it doesn't really matter whether SEL uses the > ;; the container size or the element size. If SEL used the container size, > -;; it would ignore undefined bits of the predicate but would copy the > -;; upper (undefined) bits of each container along with the defined bits. > -;; If SEL used the element size, it would use undefined bits of the predicate > -;; to select between undefined elements in each input vector. Thus the only > -;; difference is whether the undefined bits in a container always come from > -;; the same input as the defined bits, or whether the choice can vary > +;; it would ignore bits of the predicate that can be undefined, but would > copy > +;; the upper (undefined) bits of each container along with the defined bits. > +;; If SEL used the element size, it would use bits from the predicate that > can > +;; be undefined to select between undefined elements in each input vector. > +;; Thus the only difference is whether the undefined bits in a container > always > +;; come from the same input as the defined bits, or whether the choice can > vary
It looks like the main change here is to replace "undefined bits of the predicate" with "bits of the predicate that can be undefined". Could you go into more detail about the distinction? It seems to be saying that the upper bits in each predicate are sometimes defined and sometimes not. As I see it, the "level of undefinedness" is really the same for the predicates and data vectors. Within each container element, the bits that hold the element value are defined/significant and the other bits are undefined/insignificant/have arbitrary values. The same thing goes for the upper bits in each predicate element: the low bit is defined/significant and the other bits are undefined/insignificant/have arbitrary values. They might by chance be zeros when zeros are convenient or ones when ones are convenient, but the semantics don't guarantee anything either way. > ;; independently of the defined bits. > ;; > ;; For the other instructions, using the element size is more natural, > ;; so we do that for SEL as well. > +;; > +;; The use of 'aarch64_predicate_operand' here is only to support the FP > arithmetic/ > +;; UNSPEC_SEL combiner patterns. As with those operations, any predicate > bits that > +;; are insignificant to the data mode will have no effect on the operation's > result. Sorry for the formatting nit, but: long lines. The comment itself looks good though. > +;; > (define_insn "*vcond_mask_<mode><vpred>" > [(set (match_operand:SVE_ALL 0 "register_operand") > (unspec:SVE_ALL > - [(match_operand:<VPRED> 3 "register_operand") > + [(match_operand:<VPRED> 3 "aarch64_predicate_operand") > (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm") > (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")] > UNSPEC_SEL))] > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 287de0f5ae4..d38b108c5f4 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -3893,6 +3893,33 @@ aarch64_sve_fp_pred (machine_mode data_mode, rtx > *strictness) > return aarch64_ptrue_reg (aarch64_sve_pred_mode (data_mode)); > } > > +/* If DATA_MODE is a partial vector mode, emit a sequence of insns to > + zero-out the predicate bits of an existing natural GP, PRED, associated > + with the undefined elements in each container This makes it sound unconditional, whereas it's really conditional on flag_trapping_math. Also, I'd see this more as converting a container predicate to an element predicate. So maybe: /* PRED is a predicate that governs an operation on DATA_MODE. If DATA_MODE is a partial vector mode, and if exceptions must be suppressed for its undefined elements, convert PRED from a container-level predicate to an element-level predicate and ensure that the undefined elements are inactive. But again, please suggest something else if you don't think that's very clear. > + > + Return NULL_RTX if no insns were emitted. That is, if DATA_MODE is not > + a partial vector mode, or if we don't need to prevent the operation from > + interpreting undefined elements. Otherwise, return the new predicate. */ > +rtx > +aarch64_sve_emit_masked_fp_pred (machine_mode data_mode, rtx pred) > +{ > + unsigned int vec_flags = aarch64_classify_vector_mode (data_mode); > + if (flag_trapping_math && (vec_flags & VEC_PARTIAL)) > + { > + /* Control the data as if the vector was fully packed. */ > + rtx mask = aarch64_sve_packed_pred (data_mode); > + machine_mode pmode = GET_MODE (mask); > + > + /* Mask the existing predicate. */ > + rtx dst = gen_reg_rtx (pmode); > + emit_insn (gen_and3 (pmode, dst, mask, > + gen_lowpart (pmode, pred))); > + return dst; > + } > + > + return NULL_RTX; > +} How about instead returning PRED in the fallthrough case, rather than null? It seems like that would be more convenient for the callers in this patch and in patch 14, and would also be more consistent with aarch64_sve_fp_pred. As with the earlier patches, it would be good to have a token test that SELs aren't dropped when they shouldn't be. BTW, since the series started with a lot of patches that were OK as-is, or with minor changes, would you be ok with committing those and only reposting the part of the series that might need a v2? Thanks, Richard