https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89993
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Uroš Bizjak from comment #1) > (In reply to H.J. Lu from comment #0) > > It looks like the default incoming stack isn't a constant: > And where is the bug? The bug is that -mstackrealign has different behaviors on tail call, depending on if -mincoming-stack-boundary=4 or __m128 x; is used. I am expecting the same tail call optimization with -mstackrealign -S -O2 for int tst2Foo(int*, int*, int); int tst1Foo(int* pSrc, int* pDst, int len) { return tst2Foo(pSrc, pDst, len); } and #include <xmmintrin.h> int tst2Foo(int*, int*, int, __m128*); int tst1Foo(int* pSrc, int* pDst, int len) { __m128 x; return tst2Foo(pSrc, pDst, len, &x); } with and without -mincoming-stack-boundary=4.