https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
Hongtao.liu <crazylht at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #52031|0 |1 is obsolete| | --- Comment #14 from Hongtao.liu <crazylht at gmail dot com> --- Created attachment 52032 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52032&action=edit update patch Update patch, Now gcc can generate optimal code for #c0 .L4: vmovdqu (%rdi), %ymm1 vmovdqu16 32(%rdi), %ymm2 vpcmpuw $0, %ymm0, %ymm1, %k1 vpcmpuw $0, %ymm0, %ymm2, %k0 kortestw %k0, %k1 je .L10 kortestw %k1, %k1 je .L5 kmovd %k1, %eax For #c6 .L4: vmovdqu (%rdi), %ymm2 vmovdqu 32(%rdi), %ymm1 vpcmpuw $0, %ymm0, %ymm2, %k3 vpcmpuw $0, %ymm0, %ymm1, %k0 kortestd %k0, %k3 je .L10 kortestw %k3, %k3 je .L5