https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670
--- Comment #4 from Jörn Engel <joern at purestorage dot com> --- Fair enough. That means the only way to get tzcnt without a conditional is by using inline asm. Annoying, but something I can work with. Annoying because for CPUs with BMI1, tzcnt is well-defined and I explicitly tell the compiler to generate code for BMI1. So while the __builtin_ctz() in generall is undefined, it is actually well-defined for the case I care about. But I need to support older compilers anyway, so inline asm it is. Thank you!