On Thu, Nov 15, 2012 at 06:47:16AM -0800, Colin Percival wrote: > On 11/15/12 02:52, Solar Designer wrote: > > The attached patch for crypto_scrypt-sse.c (against its revision in > > scrypt-1.1.6.tgz) speeds it up by 30% on AMD Bulldozer (tested on > > FX-8120) in -march=native or -mxop builds, and by 5% to 10% on Intel > > CPUs (tested on Xeon E5649), when run on the official test vectors. > > Interesting! What compiler is this?
gcc 4.6.x > I'm surprised that the "inline" > does anything, and it seems odd that the loop unrolling wouldn't happen > automatically too. I've tried compiling in two ways: 1. -march=native -O2 -fomit-frame-pointer 2. -march=native -O2 -fomit-frame-pointer -funroll-loops -finline-functions The 5% to 10% speedup on Intel is for #1. With #2, I've just measured a speedup of 4% on the same E5649. I think the compiler does some inlining and unrolling when that is requested with optimization flags, but doing it explicitly helps anyway. My use of the "inline" keyword is a bit selective, and in the unrolling I rely on knowledge that the block size is a multiple of 64. The compiler would only be able to make use of such knowledge along with inlining or by examining all calls to a non-inlined static function, which is trickier, and it'd need to use the fact that "128 * r" is a multiple of 64. Perhaps modern compilers are capable of all that, but I do see some performance difference even with -funroll-loops -finline-functions somehow. The 30% speedup on AMD Bulldozer is primarily due to the use of XOP bit rotate intrinsics, indeed. With only this one change and no other changes, the speedup was about 25%. > > Please let me know if I should add a copyright statement, although maybe > > these changes are too minor to be subject to copyright. > > Up to you -- either declare your changes to be public domain (and don't add > a copyright line in your name) or declare them to be 2-clause BSD licensed > (and add a copyright line). Either is fine with me, but I need you to pick > one. :-) Let's do the latter. I've been placing some of my works in the public domain until a few years ago, but some people expressed concern that it might not be legally possible in my jurisdiction - so I had to add fallbacks to permissive license to my public domain statements anyway. So can you add this copyright line if/when you merge the changes? - * Copyright 2012 Solar Designer or you may use my real name (Alexander Peslyak), or both. Thanks, Alexander
