https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095
--- Comment #2 from Craig Topper <craig.topper at gmail dot com> --- The branch+mv macrofusion should execute together. The visible latency to other instructions is 1 cycle. The hardware can predicate most ALU instructions, not just mv. So even better would be putting the xor after the branch instead of a mv.