On Fri, Nov 16, 2012 at 04:09:17AM +0400, Solar Designer wrote: > I've tried compiling in two ways: > > 1. -march=native -O2 -fomit-frame-pointer > 2. -march=native -O2 -fomit-frame-pointer -funroll-loops -finline-functions > > The 5% to 10% speedup on Intel is for #1. With #2, I've just measured a > speedup of 4% on the same E5649.
With Salsa20 rounds count reduced from 8 to 2, I am getting a speedup of 10% to 15% (varies between invocations) on the E5649 for both #1 and #2. Alexander
