https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108378
Bug ID: 108378
Summary: gcc generates fpu traps unsafe code for armv8-a+sve
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: evatux at gmail dot com
Target Milestone: ---
Created attachment 54250
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54250&action=edit
original reproducer
Consider the following simple function that does fp32 inverse:
```
void foo(float *x, int len) {
for (int i = 0; i < len; ++i) x[i] = 1.f / x[i];
}
```
GCC with “-O3 -march=armv8-a+sve” produces the code that raises FE_DIVBYZERO
fpu exception. The assembler for the function is the following:
```
whilelo p0.s, wzr, w0
fmov z1.s, #1.0e+0
ptrue p1.b, all
.L3:
ld1w z0.s, p0/z, [x1, x2, lsl 2]
fdivr z0.s, p1/m, z0.s, z1.s
st1w z0.s, p0, [x1, x2, lsl 2]
add x2, x2, x3
whilelo p0.s, w2, w0
b.any .L3
```
Note, that p0 predicate register is used for loading. However, division uses
p1, which is all true. This leads to div by zero FPE when executing fdivr
instruction, if len is not a multiple of SVE width.
The issue is reproduced with gcc-10, gcc-11, and gcc-12.
Also checked gcc-trunk, by inspecting assembler output in godbolt.
Link: https://gcc.godbolt.org/z/Yz7chEcfT.
The reproducer is attached:
```
$ gcc-12 repro.c -O3 -march=armv8-a+sve -lm && ./a.out
fegetexceptflag(FE_DIVBYZERO) was:0 now:2
Test FAILED
```
Build w/o SVE or pass -DARRAY_LEN=64, and the test passes.
Adding options like `-fno-unsafe-math-optimizations` and `-ftrapping-math`
don't help.
The issue is especially noticiable when mixing Fortran and C code, as the
former has FPU exception checks enabled by default.