https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121118
--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>: https://gcc.gnu.org/g:f702b593e7268ab161053bafd097f1b09933b783 commit r16-2731-gf702b593e7268ab161053bafd097f1b09933b783 Author: Richard Sandiford <richard.sandif...@arm.com> Date: Mon Aug 4 11:45:28 2025 +0100 aarch64: Use VNx16BI for more SVE WHILE* results [PR121118] PR121118 is about a case where we try to construct a predicate constant using a permutation of a PFALSE and a WHILELO. The WHILELO is a .H operation and its result has mode VNx8BI. However, the permute instruction expects both inputs to be VNx16BI, leading to an unrecognisable insn ICE. VNx8BI is effectively a form of VNx16BI in which every odd-indexed bit is insignificant. In the PR's testcase that's OK, since those bits will be dropped by the permutation. But if the WHILELO had been a VNx4BI, so that only every fourth bit is significant, the input to the permutation would have had undefined bits. The testcase in the patch has an example of this. This feeds into a related ACLE problem that I'd been meaning to fix for a long time: every bit of an svbool_t result is significant, and so every ACLE intrinsic that returns an svbool_t should return a VNx16BI. That doesn't currently happen for ACLE svwhile* intrinsics. This patch fixes both issues together. We still need to keep the current WHILE* patterns for autovectorisation, where the result mode should match the element width. The patch therefore adds a new set of patterns that are defined to return VNx16BI instead. For want of a better scheme, it uses an "_acle" suffix to distinguish these new patterns from the "normal" ones. The formulation used is: (and:VNx16BI (subreg:VNx16BI normal-pattern 0) C) where C has mode VNx16BI and is a canonical ptrue for normal-pattern's element width (so that the low bit of each element is set and the upper bits are clear). This is a bit clunky, and leads to some repetition. But it has two advantages: * After g:965564eafb721f8000013a3112f1bba8d8fae32b, converting the above expression back to normal-pattern's mode will reduce to normal-pattern, so that the pattern for testing the result using a PTEST doesn't change. * It gives RTL optimisers a bit more information, as the new tests demonstrate. In the expression above, C is matched using a new "special" predicate aarch64_ptrue_all_operand, where "special" means that the mode on the predicate is not necessarily the mode of the expression. In this case, C always has mode VNx16BI, but the mode on the predicate indicates which kind of canonical PTRUE is needed. gcc/ PR testsuite/121118 * config/aarch64/iterators.md (VNx16BI_ONLY): New mode iterator. * config/aarch64/predicates.md (aarch64_ptrue_all_operand): New predicate. * config/aarch64/aarch64-sve.md (@aarch64_sve_while_<while_optab_cmp><GPI:mode><VNx16BI_ONLY:mode>_acle) (@aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle_cc): New patterns. * config/aarch64/aarch64-sve-builtins-functions.h (while_comparison::expand): Use the new _acle patterns that always return a VNx16BI. * config/aarch64/aarch64-sve-builtins-sve2.cc (svwhilerw_svwhilewr_impl::expand): Likewise. * config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while): Likewise. gcc/testsuite/ PR testsuite/121118 * gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test. * gcc.target/aarch64/sve/acle/general/whilele_13.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_6.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilege_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilegt_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilerw_5.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilewr_5.c: Likewise.