https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #3 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- This is deliberate, since __fp16 is only a “storage type”: all __fp16 arithmetic happens on float, a bit like all short arithmetic happens in int. It works if you use _Float16 instead: _Float16 mul(_Float16 x, _Float16 y, _Float16 z) { return x * y + z; } vfma.f16 s2, s0, s1 vmov s0, s2 @ __fp16 bx lr