https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120231

Joe Ramsay <joe.ramsay at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joe.ramsay at arm dot com

--- Comment #5 from Joe Ramsay <joe.ramsay at arm dot com> ---
Hello! I have discovered a similar issue, commenting in case another data point
is useful. The following:

#include <math.h>

_Float16 norm(_Float16 re, _Float16 im) {
    return sqrtf(re * re + im * im);
}

compiled with -march=armv8-a+fp16 -O3 gives

norm:
        fmul    h1, h1, h1
        fmadd   h1, h0, h0, h1
        fcvt    s0, h1
        fcmp    s0, #0.0
        bpl     .L4
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     s1, [sp, 28]
        bl      sqrtf
        ldr     s1, [sp, 28]
        ldp     x29, x30, [sp], 32
        fsqrt   h0, h1
        ret
.L4:
        fsqrt   h0, h1
        ret

and with -ffast-math added gives:

norm:
        fmul    h1, h1, h1
        fmadd   h0, h0, h0, h1
        fsqrt   h0, h0
        ret

I think GCC is being overly cautious here - the fast FSQRT case is fine without
fast-math (the optimal sequence is emitted without -ffast-math for single- and
double-precision floats).

Reply via email to