https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to ncm from comment #8) > It seems worth mentioning that the round trip through > L1 cache is just a workaround for the optimizer refusing > to ever emit two CMOV instructions in a basic block. > > Recognizing and replacing the construct with CMOVs > explicitly would speed up a great many algorithms. Not universally. See PR56309.