https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #4) > change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal. > then RA works good to choose kxnor for setting constm1_rtx to mask register, > and i got below with your attached patch(change #if 0 to #if 1), seems > better than orginal patch. > > 6foo: > 7.LFB0: > 8 .cfi_startproc > 9 testl %edi, %edi > 10 jle .L9 > 11 kxnorb %k1, %k1, %k1 > 12 cmpl $4, %edi > 13 jl .L11 > 14.L3: > 15 vbroadcastsd .LC2(%rip), %ymm3 > 16 vmovdqa .LC0(%rip), %xmm2 > 17 xorl %eax, %eax > 18 xorl %ecx, %ecx > 19 .p2align 4,,10 > 20 .p2align 3 > 21.L7: > 22 vmovapd b(%rax), %ymm0{%k1} > 23 addl $4, %ecx > 24 movl %edi, %edx > 25 vmulpd %ymm3, %ymm0, %ymm1 > 26 subl %ecx, %edx > 27 cmpl $4, %edx > 28 vmovapd %ymm1, a(%rax){%k1} > 29 vpbroadcastd %edx, %xmm1 > 30 movl $-1, %edx > 31 vpcmpd $1, %xmm1, %xmm2, %k1 > 32 kmovb %k1, %esi > 33 cmovge %edx, %esi not sure if the round-trip to GPRs for the sake of a cmovge is worth, I guess a branch would be better.