https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108378
Bug ID: 108378 Summary: gcc generates fpu traps unsafe code for armv8-a+sve Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: evatux at gmail dot com Target Milestone: --- Created attachment 54250 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54250&action=edit original reproducer Consider the following simple function that does fp32 inverse: ``` void foo(float *x, int len) { for (int i = 0; i < len; ++i) x[i] = 1.f / x[i]; } ``` GCC with “-O3 -march=armv8-a+sve” produces the code that raises FE_DIVBYZERO fpu exception. The assembler for the function is the following: ``` whilelo p0.s, wzr, w0 fmov z1.s, #1.0e+0 ptrue p1.b, all .L3: ld1w z0.s, p0/z, [x1, x2, lsl 2] fdivr z0.s, p1/m, z0.s, z1.s st1w z0.s, p0, [x1, x2, lsl 2] add x2, x2, x3 whilelo p0.s, w2, w0 b.any .L3 ``` Note, that p0 predicate register is used for loading. However, division uses p1, which is all true. This leads to div by zero FPE when executing fdivr instruction, if len is not a multiple of SVE width. The issue is reproduced with gcc-10, gcc-11, and gcc-12. Also checked gcc-trunk, by inspecting assembler output in godbolt. Link: https://gcc.godbolt.org/z/Yz7chEcfT. The reproducer is attached: ``` $ gcc-12 repro.c -O3 -march=armv8-a+sve -lm && ./a.out fegetexceptflag(FE_DIVBYZERO) was:0 now:2 Test FAILED ``` Build w/o SVE or pass -DARRAY_LEN=64, and the test passes. Adding options like `-fno-unsafe-math-optimizations` and `-ftrapping-math` don't help. The issue is especially noticiable when mixing Fortran and C code, as the former has FPU exception checks enabled by default.