https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95529
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Richard Biener from comment #5) > > We can do a peephole that would convert REP BSF + TEST to BSF. However, on > > BMI capable targets, REP BSF decodes as TZCNT, so the question is if one BSF > > is faster than TZCNT + TEST? > I would expect so, yes. Not universally. While Intel is agnostic to either insn, Ryzen TZCNT: latency 2, rec through 0.5 BSF: latency 3, rec through 3 > With -mbmi and TZCNT we could also use the carry flag to elide the test. We would have to change the mode of the flags reg on a follow up flags user (CMOVE, *movsicc_noc) from reg:CCZ to reg:CCC. This can't be done during combine. > > (Please note that the conversion to CMOVE comes a bit late in the pass > > sequence, so we can't convert TZCNT + TEST + CMOVE to TZCNT + CMOVC.)