------- Comment #17 from potswa at mac dot com 2009-09-15 14:33 ------- Hmm, on my Core2 my impl still beats the present one on many cases of shift by 1, but the margin is narrower.
Shift by 1 is the only case where the temporary can really help, and I eliminated it completely. I suppose I should special-case it back in for k = ± 1. Also, before a commit I'd like to see about installing this algo for the forward and bidirectional cases. If it's not given n, it can compute it as a side effect of a run through the first loop. Once n is found, the second, backwards-iterating loop can be used with the bidirectional iterator and the first, forwards loop can be used with a forward iterator. These will carry the same optimal one-pass memory behavior and (n - gcd(n,k)) swap complexity to all the overloads. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41351