https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670
--- Comment #6 from Jörn Engel <joern at purestorage dot com> --- True for one, but not the other. return mask ? __builtin_ctz(mask) : 32; 1099: 83 f6 ff xor $0xffffffff,%esi 109c: 74 47 je 10e5 <main+0x85> 109e: f3 0f bc f6 tzcnt %esi,%esi I used: gcc-8 -std=gnu11 -Wall -Wextra -g -march=core-avx2 -mbmi -fPIC -O3 % _tzcnt_u32() works as you said it should. Nicer than inline asm and allows type checking. Thank you for that hint!