https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037

--- Comment #8 from ncm at cantrip dot org ---
It seems worth mentioning that the round trip through 
L1 cache is just a workaround for the optimizer refusing 
to ever emit two CMOV instructions in a basic block.

Recognizing and replacing the construct with CMOVs 
explicitly would speed up a great many algorithms.

Although, the L1 excursion remains necessary for the 
general case of user-defined types.

It also seems worth mention that there is no worry
over dependency chains, in partitioning. Once the
values are swapped they are not looked at again
until the next pass.

Reply via email to