http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #5 from Alexander Peslyak <solar-gcc at openwall dot com> 2012-01-04 19:39:26 UTC --- I wrote and ran some scripts to test many versions/snapshots of gcc. It turns out that 4.6-20100703 (oldest 4.6 snapshot available for FTP) was already affected by this regression, whereas 4.5-20111229 and 4.4-20120103 are not affected (as expected). Also, it turns out that there was a smaller regression at this same benchmark between 4.3 and 4.4. That is, 4.3 produces the fastest code of all gcc versions I tested. Here are some numbers: 4.3.5 20100502 - 2950K c/s, 28229 bytes 4.3.6 20110626 - 2950K c/s, 28229 bytes 4.4.5 20100504 - 2697K c/s, 29764 bytes 4.4.7 20120103 - 2691K c/s, 29316 bytes 4.5.1 20100603 - 2729K c/s, 29203 bytes 4.5.4 20111229 - 2710K c/s, 29203 bytes 4.6.0 20100703 - 2133K c/s, 29911 bytes 4.6.0 20100807 - 2119K c/s, 29940 bytes 4.6.0 20100904 - 2142K c/s, 29848 bytes 4.6.0 20101106 - 2124K c/s, 29848 bytes 4.6.0 20101204 - 2114K c/s, 29624 bytes 4.6.3 20111230 - 2116K c/s, 29624 bytes 4.7.0 20111231 - 2147K c/s, 29692 bytes These are for JtR 1.7.9 with DES_BS_ASM set to 0 on line 157 of x86-64.h (to disable this version's workaround for this GCC 4.6 regression), built with "make linux-x86-64" and run on one core in a Xeon E5420 2.5 GHz (the system is otherwise idle). The code sizes given are for .text of DES_bs_b.o (which contains three similar functions, of which one is in use by this benchmark - that is, the code size in the loop is about 10 KB). As you can see, 4.3 generated code that was both significantly faster and a bit smaller than all other versions'. In 4.4, the speed decreased by 8.5% and code size increased by 4.4%. 4.5 corrected this to a very limited extent - still 8% slower and 3.5% larger than 4.3's. 4.6 brought a huge performance drop and a slight code size increase. 4.7.0 20111231's code is still 27% slower than 4.3's.