https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151
Bug ID: 118151 Summary: Relax the SVE PTEST matching conditions for any/none (ne/eq) Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: aarch64-sve, missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* All our current PTEST combiner patterns are for the general CC_NZC case, where the eventual condition could be first/not-first/last/not-last/any/none. For this general case, it's only usually possible to fold a PTEST with a previous (potential) flag-setting instruction if both instructions have the same governing predicate. However, for the simple any/none (ne/eq) case, it's enough for the PTEST gp to be a superset of the other instruction's gp. In particular, we can always fold if the PTEST is predicated on a PTRUE for the same element width or narrower. The failure to handle this case is causing us to miss many folds, both in ACLE code and in early-break tests. I think it could be handled by using CC_Z for ne/eq and relaxing aarch64_sve_same_pred_for_ptest_p for that case. It might even be a relatively simple change. For example: #include <arm_sve.h> int foo (svbool_t pg, svint32_t x, svint32_t y) { return svptest_any(svptrue_b8(), svcmpeq(pg, x, y)); } currently generates: ptrue p3.b, all cmpeq p0.s, p0/z, z0.s, z1.s ptest p3, p0.b cset w0, any ret where the ptest and ptrue are redundant. The same is true with svptrue_b8 replaced by svptrue_b16 or svptrue_b32, but not with svptrue_b64. (LLVM optimises the svptrue_b32 case, but not the others.) We should try to make it so that two tests of the same result, such as svptest_last and svptest_any, both still use the same PTEST, even if they initially use different CC modes.