https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #12 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #10) > Looks like this was fixed in GCC 15: > ``` > foo: > .LFB7284: > .cfi_startproc > vmovd %edi, %xmm2 > vmovdqa32 %zmm1, %zmm4 > kmovw m(%rip), %k1 > vpsrad %xmm2, %zmm0, %zmm4{%k1} > vmovdqa32 %zmm4, %zmm0 > ret > > > ``` > > Though for comment #5 we get: > ``` > foo: > .LFB7470: > .cfi_startproc > vmovdqa64 %zmm0, %zmm3 > vmovd %edi, %xmm2 > vmovdqa32 %zmm1, %zmm0 > kmovw m(%rip), %k1 > vmovdqa32 %zmm1, %zmm4 > vpslld %xmm2, %zmm3, %zmm0{%k1} > kmovw m(%rip), %k2 > vpsrad %xmm2, %zmm3, %zmm4{%k2} > vmovdqa32 %zmm0, zzz(%rip) > vmovdqa32 %zmm4, %zmm0 > ret > ``` > > > Note the extra kmovw. The extra kmovw is gone if you add -mavx512bw.