https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> --- I have looked briefly at this. The compiler actually generates the following: vpmovmskb %ymm0, %edx # 16 avx2_pmovmskb [length = 4] testl %edx, %edx # 18 *cmpsi_ccno_1/1 [length = 2] je .L5 # 19 *jcc_1 [length = 2] movslq %edx, %rdx # 21 *extendsidi2_rex64/2 [length = 3] tzcntq %rdx, %rdx # 52 *ctzdi2_falsedep [length = 5] from: int _14; long unsigned int v.1_15; int _16; ... _14 = __builtin_ia32_pmovmskb256 (_13); if (_14 != 0) goto <bb 5>; else goto <bb 6>; <bb 5>: v.1_15 = (long unsigned int) _14; _16 = __builtin_ctzl (v.1_15); _17 = (long int) _16; The intrinsic returns "int", and from the above tree dump, the compiler won't even consider to combine the sign-extension with vpmovmskb. So, why not: unsigned int v; v = (unsigned int) _mm256_movemask_epi8( ... ); if (v != 0) return (long) __builtin_ctz( v );