https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96528
Bug ID: 96528 Summary: [11 Regression] vector comparisons on ARM Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: arm-none-linux-gnueabihf (see the discussion after https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551468.html ) I am using a compiler configured with --target=arm-none-linux-gnueabihf --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16 typedef unsigned int vec __attribute__((vector_size(16))); typedef int vi __attribute__((vector_size(16))); vi f(vec a,vec b){ return a==5 | b==7; } Compiling with -O yields very long scalar code. Adding -fno-tree-forwprop gets back the nice, vector code. (at higher optimization levels, one may also need to disable vrp) This is due to the fact that while the ARM target handles VEC_COND_EXPR<v == w, -1, 0> just fine, it does not handle a plain v == w that is not fed directly to a VEC_COND_EXPR. I was surprised to notice that "grep vec_cmp" gives a number of lines in the aarch64/ directory, but none in arm/, while AFAIK those neon instructions are the same. Would it be possible to implement this on ARM as well? Other middle-end options are also possible, but the difference with aarch64 makes it tempting to handle it in the target.