https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
--- Comment #3 from Matthias Kretz <kretz at kde dot org> --- Some more observations: 1. The instruction sequence: kmovq %k1,-0x8(%rsp) vmovq -0x8(%rsp),%xmm1 vmovq %xmm1,%rax kmovq %rax,%k0 should be a simple `kmovq %k1,%k0` instead. 2. Adding `asm("");` before the compare intrinsic makes the problem go away. 3. Using inline asm in place of the kortest intrinsic shows the same preference for using the k0 register. Test case: void bad(__m512i x, __m512i y) { auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); asm("kmovq %0,%%rax" :: "k"(k)); } 4. The following test cases still unnecessarily prefers k0, but does it with a nicer `kmovq %k1,%0`: auto almost_good(__m512i x, __m512i y) { auto k = _mm512_cmp_epi8_mask(x, y, _MM_CMPINT_EQ); asm("kmovq %0, %0" : "+k"(k)); return k; } (cf. https://godbolt.org/g/hZTga4 for 2, 3 and 4)