https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037
--- Comment #10 from ncm at cantrip dot org --- (In reply to Uroš Bizjak from comment #9) > (In reply to ncm from comment #8) > > It seems worth mentioning that the round trip through > > L1 cache is just a workaround for the optimizer refusing > > to ever emit two CMOV instructions in a basic block. > > > > Recognizing and replacing the construct with CMOVs > > explicitly would speed up a great many algorithms. > > Not universally. See PR56309. I am aware of that report. Transforming this rendition of swap_if as suggested would not create any _new_ dependencies, so may be done without fear of introducing regressions. Actually using this version of swap_if in algorithms requires careful consideration of whether it may build such dependency chains, but its use in partitioning, specifically, has been proven safe.