https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949

            Bug ID: 118949
           Summary: RISC-V: Extra FRM writes since GCC-14.2
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

GCC 15 is emitting more FRM writes than GCC 14, and as far as I can tell
they're unnecessary.  For example, under `-march=rv64gcv -mabi=lp64d -O3
-ffast-math`

#include <cmath>

void func(const float *a, const float *b, float *c)
{
    for (long i = 0; i < 1024; ++i) {
        float a_round = std::lround(a[i]);
        float b_round = std::lround(b[i]);
        c[i] = a_round + b_round;
    }
}

my main loop for 15 is

.L4:
        fsrmi   4
        vsetvli a5,a4,e32,mf2,ta,ma
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        slli    a3,a5,2
        sub     a4,a4,a5
        add     a0,a0,a3
        add     a1,a1,a3
        vfwcvt.x.f.v    v1,v2
        fsrm    a6
        vfncvt.f.x.w    v1,v1
        fsrmi   4
        vfwcvt.x.f.v    v2,v3
        fsrm    a6
        vfncvt.f.x.w    v2,v2
        vfadd.vv        v1,v1,v2
        vse32.v v1,0(a2)
        add     a2,a2,a3
        bne     a4,zero,.L4

which has those odd back-to-back FRM writes.  They're not in 14

.L4:
        fsrmi   4
        vsetvli a5,a4,e32,mf2,ta,ma
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        slli    a3,a5,2
        sub     a4,a4,a5
        add     a0,a0,a3
        add     a1,a1,a3
        vfwcvt.x.f.v    v1,v2
        vfwcvt.x.f.v    v2,v3
        fsrm    a6
        vfncvt.f.x.w    v1,v1
        vfncvt.f.x.w    v2,v2
        vfadd.vv        v1,v1,v2
        vse32.v v1,0(a2)
        add     a2,a2,a3
        bne     a4,zero,.L4

LLVM isn't vectorizing for me, not sure if that's some flag issue or if it's
costing even the pair of FRM writes as bad.

Reply via email to