https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #12 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #10)
> Looks like this was fixed in GCC 15:
> ```
> foo:
> .LFB7284:
>         .cfi_startproc
>         vmovd   %edi, %xmm2
>         vmovdqa32       %zmm1, %zmm4
>         kmovw   m(%rip), %k1
>         vpsrad  %xmm2, %zmm0, %zmm4{%k1}
>         vmovdqa32       %zmm4, %zmm0
>         ret
> 
> 
> ```
> 
> Though for comment #5 we get:
> ```
> foo:
> .LFB7470:
>         .cfi_startproc
>         vmovdqa64       %zmm0, %zmm3
>         vmovd   %edi, %xmm2
>         vmovdqa32       %zmm1, %zmm0
>         kmovw   m(%rip), %k1
>         vmovdqa32       %zmm1, %zmm4
>         vpslld  %xmm2, %zmm3, %zmm0{%k1}
>         kmovw   m(%rip), %k2
>         vpsrad  %xmm2, %zmm3, %zmm4{%k2}
>         vmovdqa32       %zmm0, zzz(%rip)
>         vmovdqa32       %zmm4, %zmm0
>         ret
> ```
> 
> 
> Note the extra kmovw.

The extra kmovw is gone if you add -mavx512bw.

Reply via email to