http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
Bug #: 51017 Summary: GCC 4.6 performance regression (vs. 4.4/4.5) Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: solar-...@openwall.com GCC 4.6 happens to produce approx. 25% slower code on at least x86_64 than 4.4 and 4.5 did for John the Ripper 1.7.8's bitslice DES implementation. To reproduce, download http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8.tar.bz2 and build it with "make linux-x86-64" (will use SSE2 intrinsics), "make linux-x86-64-avx" (will use AVX instead), or "make generic" (won't use any intrinsics). Then run "../run/john -te=1". With GCC 4.4 and 4.5, the "Traditional DES" benchmark reports a speed of around 2500K c/s for the "linux-x86-64" (SSE2) build on a 2.33 GHz Core 2 (this is using one core). With 4.6, this drops to about 1850K c/s. Similar slowdown was observed for AVX on Core i7-2600K when going from GCC 4.5.x to 4.6.x. And it is reproducible for the without-intrinsics code as well, although that's of less practical importance (the intrinsics are so much faster). Similar slowdown with GCC 4.6 was reported by a Mac OS X user. It was also spotted by Phoronix in their recently published C compiler benchmarks, but misinterpreted as a GCC vs. clang difference. Adding "-Os" to OPT_INLINE in the Makefile partially corrects the performance (to something like 2000K c/s - still 20% slower than GCC 4.4/4.5's). Applying the OpenMP patch from http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8-omp-des-4.diff.gz and then running with OMP_NUM_THREADS=1 (for a fair comparison) corrects the performance almost fully. Keeping the patch applied, but removing -fopenmp still keeps the performance at a good level. So it's some change made to the source code by this patch that mitigates the GCC regression. Similar behavior is seen with current CVS version of John the Ripper, even though it has OpenMP support for DES heavily revised and integrated into the tree.