https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120231
Joe Ramsay <joe.ramsay at arm dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |joe.ramsay at arm dot com
--- Comment #5 from Joe Ramsay <joe.ramsay at arm dot com> ---
Hello! I have discovered a similar issue, commenting in case another data point
is useful. The following:
#include <math.h>
_Float16 norm(_Float16 re, _Float16 im) {
return sqrtf(re * re + im * im);
}
compiled with -march=armv8-a+fp16 -O3 gives
norm:
fmul h1, h1, h1
fmadd h1, h0, h0, h1
fcvt s0, h1
fcmp s0, #0.0
bpl .L4
stp x29, x30, [sp, -32]!
mov x29, sp
str s1, [sp, 28]
bl sqrtf
ldr s1, [sp, 28]
ldp x29, x30, [sp], 32
fsqrt h0, h1
ret
.L4:
fsqrt h0, h1
ret
and with -ffast-math added gives:
norm:
fmul h1, h1, h1
fmadd h0, h0, h0, h1
fsqrt h0, h0
ret
I think GCC is being overly cautious here - the fast FSQRT case is fine without
fast-math (the optimal sequence is emitted without -ffast-math for single- and
double-precision floats).