https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- Since BFmode is most like in xmm register, I'm going to use vector shift instruction: pslld $16, %xmm0 for extendbfsf2, psrld %16, %xmm0 for truncsfbf2, It doesn't require any GPR, and no need to use avx512bf16 instruction due to restriction of flag_unsafe_math_optimizations.