https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121294
Bug ID: 121294 Summary: Incorrect optimisation of b16/32/64 forms of SVE permute intrinsics Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: aarch64-sve, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* #include <arm_sve.h> svbool_t foo () { return svtrn1_b16 (svptrue_b8 (), svptrue_b16 ()); } compiled with -O2 -march=armv8.2-a+sve gives: foo: ptrue p0.b, all trn1 p0.h, p0.h, p0.h ret which is equivalent to: foo: ptrue p0.b, all ret The svptrue_b16() has effectively been replaced by svptrue_b8(). This happens because the input and output of the underlying define_insn have VNx8BImode, meaning that every odd-indexed bit of the predicate is insignificant. That's ok/correct when permuting predicates created during autovectorisation, but it isn't correct for ACLE code, where every bit of an svbool_t is significant.