https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108378

            Bug ID: 108378
           Summary: gcc generates fpu traps unsafe code for armv8-a+sve
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: evatux at gmail dot com
  Target Milestone: ---

Created attachment 54250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54250&action=edit
original reproducer

Consider the following simple function that does fp32 inverse:
```
    void foo(float *x, int len) {
        for (int i = 0; i < len; ++i) x[i] = 1.f / x[i];
    }
```

GCC with “-O3 -march=armv8-a+sve” produces the code that raises FE_DIVBYZERO
fpu exception. The assembler for the function is the following: 

```
        whilelo p0.s, wzr, w0
        fmov    z1.s, #1.0e+0
        ptrue   p1.b, all
.L3:
        ld1w    z0.s, p0/z, [x1, x2, lsl 2]
        fdivr   z0.s, p1/m, z0.s, z1.s
        st1w    z0.s, p0, [x1, x2, lsl 2]
        add     x2, x2, x3
        whilelo p0.s, w2, w0
        b.any   .L3
```

Note, that p0 predicate register is used for loading. However, division uses
p1, which is all true. This leads to div by zero FPE when executing fdivr
instruction, if len is not a multiple of SVE width.

The issue is reproduced with gcc-10, gcc-11, and gcc-12.
Also checked gcc-trunk, by inspecting assembler output in godbolt.
Link: https://gcc.godbolt.org/z/Yz7chEcfT.

The reproducer is attached:
```
$ gcc-12 repro.c -O3 -march=armv8-a+sve -lm && ./a.out
fegetexceptflag(FE_DIVBYZERO) was:0 now:2
Test FAILED
```

Build w/o SVE or pass -DARRAY_LEN=64, and the test passes.
Adding options like `-fno-unsafe-math-optimizations` and `-ftrapping-math`
don't help.

The issue is especially noticiable when mixing Fortran and C code, as the
former has FPU exception checks enabled by default.

Reply via email to