http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000
H.J. Lu <hjl.tools at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org --- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2010-12-18 15:36:30 UTC --- (In reply to comment #3) > Compiled like so: > $ gcc-4.4.2 -S -O2 sha256_4way.i -o sha256_4way-44.s > $ gcc-4.5.0 -S -O2 sha256_4way.i -o sha256_4way-45.s > > $ grep -c call *.s > sha256_4way-44.s:0 > sha256_4way-45.s:484 > $ grep call *.s|head > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > sha256_4way-45.s: call ROTR > $ > > ROTR should have been inlined: > > static inline __m128i ROTR(__m128i x, const int n) { > return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n); > } > > This probably explains the slowdown. This is caused by revision 151511: http://gcc.gnu.org/ml/gcc-cvs/2009-09/msg00257.html