https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> --- float _mm_cvtsbh_ss (__bf16 __A) { union{ float sf; __bf16 bf[2];} __tmp; __tmp.sf = 0.0f; __tmp.bf[1] = __A; return __tmp.sf; } Looks like gcc can optimize it to _mm_cvtsbh_ss(bool _Accum): movd %xmm0, %eax sall $16, %eax movd %eax, %xmm0 ret