https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121294

            Bug ID: 121294
           Summary: Incorrect optimisation of b16/32/64 forms of SVE
                    permute intrinsics
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: aarch64-sve, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*-*-*

#include <arm_sve.h>

svbool_t
foo ()
{
  return svtrn1_b16 (svptrue_b8 (), svptrue_b16 ());
}

compiled with -O2 -march=armv8.2-a+sve gives:

foo:
        ptrue   p0.b, all
        trn1    p0.h, p0.h, p0.h
        ret

which is equivalent to:

foo:
        ptrue   p0.b, all
        ret

The svptrue_b16() has effectively been replaced by svptrue_b8().

This happens because the input and output of the underlying define_insn have
VNx8BImode, meaning that every odd-indexed bit of the predicate is
insignificant.  That's ok/correct when permuting predicates created during
autovectorisation, but it isn't correct for ACLE code, where every bit of an
svbool_t is significant.

Reply via email to