https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96528

            Bug ID: 96528
           Summary: [11 Regression] vector comparisons on ARM
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: arm-none-linux-gnueabihf

(see the discussion after
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551468.html )

I am using a compiler configured with --target=arm-none-linux-gnueabihf
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16

typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
    return a==5 | b==7;
}

Compiling with -O yields very long scalar code. Adding -fno-tree-forwprop gets
back the nice, vector code. (at higher optimization levels, one may also need
to disable vrp)

This is due to the fact that while the ARM target handles VEC_COND_EXPR<v == w,
-1, 0> just fine, it does not handle a plain v == w that is not fed directly to
a VEC_COND_EXPR. I was surprised to notice that "grep vec_cmp" gives a number
of lines in the aarch64/ directory, but none in arm/, while AFAIK those neon
instructions are the same. Would it be possible to implement this on ARM as
well? Other middle-end options are also possible, but the difference with
aarch64 makes it tempting to handle it in the target.

Reply via email to