http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000
--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> 2010-12-18 15:40:38
UTC ---
(In reply to comment #5)
> (In reply to comment #3)
> > Compiled like so:
> > $ gcc-4.4.2 -S -O2 sha256_4way.i -o sha256_4way-44.s
> > $ gcc-4.5.0 -S -O2 sha256_4way.i -o sha256_4way-45.s
> >
> > $ grep -c call *.s
> > sha256_4way-44.s:0
> > sha256_4way-45.s:484
> > $ grep call *.s|head
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > sha256_4way-45.s: call ROTR
> > $
> >
> > ROTR should have been inlined:
> >
> > static inline __m128i ROTR(__m128i x, const int n) {
> > return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n);
> > }
> >
> > This probably explains the slowdown.
>
> This is caused by revision 151511:
>
> http://gcc.gnu.org/ml/gcc-cvs/2009-09/msg00257.html
It is fixed by revision 166517:
http://gcc.gnu.org/ml/gcc-cvs/2010-11/msg00405.html