https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96528
Bug ID: 96528
Summary: [11 Regression] vector comparisons on ARM
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: glisse at gcc dot gnu.org
Target Milestone: ---
Target: arm-none-linux-gnueabihf
(see the discussion after
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551468.html )
I am using a compiler configured with --target=arm-none-linux-gnueabihf
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16
typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
return a==5 | b==7;
}
Compiling with -O yields very long scalar code. Adding -fno-tree-forwprop gets
back the nice, vector code. (at higher optimization levels, one may also need
to disable vrp)
This is due to the fact that while the ARM target handles VEC_COND_EXPR<v == w,
-1, 0> just fine, it does not handle a plain v == w that is not fed directly to
a VEC_COND_EXPR. I was surprised to notice that "grep vec_cmp" gives a number
of lines in the aarch64/ directory, but none in arm/, while AFAIK those neon
instructions are the same. Would it be possible to implement this on ARM as
well? Other middle-end options are also possible, but the difference with
aarch64 makes it tempting to handle it in the target.