https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #2) > (In reply to Hongtao.liu from comment #0) > > #include<immintrin.h> > > __m128h > > foo (__m128h a, __m128h b, __m128h c, __mmask8 m) > > { > > return _mm_mask_fcmadd_round_sch (a, m, b, c, 8); > > } > > > > > > _Z3fooDv8_DF16_S_S_h: > > kmovd k1, edi > > vfcmaddcsh xmm2{k1}, xmm0, xmm1, {rn-sae} > > vmovaps xmm0{k1}, xmm2 > > ret > > > > k1 must & 1 before vmovaps xmm0{k1}, xmm2. > > Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from > src1(xmm0) here. No, intrinsic guide it using writemask k (elements are copied from a when mask bit 0 is not set)