https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95529

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #5)

> > We can do a peephole that would convert REP BSF + TEST to BSF. However, on
> > BMI capable targets, REP BSF decodes as TZCNT, so the question is if one BSF
> > is faster than TZCNT + TEST?
> I would expect so, yes.

Not universally. While Intel is agnostic to either insn, Ryzen

TZCNT: latency 2, rec through 0.5
BSF:   latency 3, rec through 3

> With -mbmi and TZCNT we could also use the carry flag to elide the test.

We would have to change the mode of the flags reg on a follow up flags user
(CMOVE, *movsicc_noc) from reg:CCZ to reg:CCC. This can't be done during
combine.

> > (Please note that the conversion to CMOVE comes a bit late in the pass
> > sequence, so we can't convert TZCNT + TEST + CMOVE to TZCNT + CMOVC.)

Reply via email to