https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120231
Joe Ramsay <joe.ramsay at arm dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |joe.ramsay at arm dot com --- Comment #5 from Joe Ramsay <joe.ramsay at arm dot com> --- Hello! I have discovered a similar issue, commenting in case another data point is useful. The following: #include <math.h> _Float16 norm(_Float16 re, _Float16 im) { return sqrtf(re * re + im * im); } compiled with -march=armv8-a+fp16 -O3 gives norm: fmul h1, h1, h1 fmadd h1, h0, h0, h1 fcvt s0, h1 fcmp s0, #0.0 bpl .L4 stp x29, x30, [sp, -32]! mov x29, sp str s1, [sp, 28] bl sqrtf ldr s1, [sp, 28] ldp x29, x30, [sp], 32 fsqrt h0, h1 ret .L4: fsqrt h0, h1 ret and with -ffast-math added gives: norm: fmul h1, h1, h1 fmadd h0, h0, h0, h1 fsqrt h0, h0 ret I think GCC is being overly cautious here - the fast FSQRT case is fine without fast-math (the optimal sequence is emitted without -ffast-math for single- and double-precision floats).