http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #7 from Alexander Peslyak <solar-gcc at openwall dot com> 2012-01-04 23:00:24 UTC --- (I ran the tests below and wrote this comment before seeing Jakub's. Then I thought I'd post it anyway.) Here are some numbers for gcc releases: 4.0.0 - 383K c/s, 71879 bytes (this old version of gcc generates function calls for SSE2 intrinsics) 4.1.0 - 2959K c/s, 28182 bytes 4.1.2 - 2964K c/s, 28365 bytes 4.2.0 - 2968K c/s, 28363 bytes 4.2.4 - 2971K c/s, 28382 bytes 4.3.0 - 2971K c/s, 28229 bytes 4.3.6 - 2959K c/s, 28229 bytes 4.4.0 - 2625K c/s, 29770 bytes 4.4.6 - 2695K c/s, 29316 bytes 4.5.0 - 2729K c/s, 29203 bytes 4.5.3 - 2716K c/s, 29203 bytes 4.6.0 - 2111K c/s, 29624 bytes 4.6.2 - 2123K c/s, 29624 bytes So thing were really good for versions 4.1.0 through 4.3.6, but started to get worse afterwards and got really bad with 4.6. To be fair, things are very different for some other hash/cipher types supported by JtR - e.g., for Blowfish-based hashing we went from 560 c/s for 4.1.0 to 700 c/s for 4.6.2. <plug>JtR 1.7.9 and 1.7.9-jumbo include a benchmark comparison tool called relbench, which calculates geometric mean, median, and some other metrics for multiple individual outputs from a pair of JtR benchmark invocations (e.g., built with different versions of gcc). In 1.7.9-jumbo-5, there are over 160 individual benchmark outputs (for different hashes/ciphers) and it may be built in a variety of ways (with/without explicit assembly code, with/without intrinsics etc.) relbench combines those 160+ outputs into a nice summary showing overall speedup/slowdown and more. It might be useful for testing of future gcc versions for potential performance regressions like this.</plug>