https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Hongtao.liu from comment #0)
> > #include<immintrin.h>
> > __m128h
> > foo (__m128h a, __m128h b, __m128h c, __mmask8 m)
> > { 
> >     return _mm_mask_fcmadd_round_sch (a, m, b, c, 8);
> > }
> > 
> > 
> > _Z3fooDv8_DF16_S_S_h:
> >         kmovd   k1, edi
> >         vfcmaddcsh      xmm2{k1}, xmm0, xmm1, {rn-sae}
> >         vmovaps xmm0{k1}, xmm2
> >         ret
> > 
> > k1 must & 1 before vmovaps xmm0{k1}, xmm2.
> 
> Or just vmovaps xmm0, xmm2 since vfcmaddcsh will copy upper [32:128] from
> src1(xmm0) here.

No, intrinsic guide it using writemask k (elements are copied from a when mask
bit 0 is not set)

Reply via email to