https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66697
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> --- IMO, if "outside" code misaligns stack, then these applications need support for -mincoming-stack-boundary=3, which is currently limited on x86_64 to 4. Documentation says: '-mincoming-stack-boundary=NUM' Assume the incoming stack is aligned to a 2 raised to NUM byte boundary. If '-mincoming-stack-boundary' is not specified, the one specified by '-mpreferred-stack-boundary' is used. [...] Following patch: --cut here-- Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 228460) +++ config/i386/i386.c (working copy) @@ -5102,8 +5102,7 @@ ix86_option_override_internal (bool main_args_p, ix86_incoming_stack_boundary = ix86_default_incoming_stack_boundary; if (opts_set->x_ix86_incoming_stack_boundary_arg) { - int min = (TARGET_64BIT_P (opts->x_ix86_isa_flags) - ? (TARGET_SSE_P (opts->x_ix86_isa_flags) ? 4 : 3) : 2); + int min = TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 3 : 2; if (opts->x_ix86_incoming_stack_boundary_arg < min || opts->x_ix86_incoming_stack_boundary_arg > 12) --cut here-- will compile following test: -- cut here-- typedef float v4sf __attribute__((vector_size(16))); v4sf test (v4sf a, v4sf b) { volatile v4sf z = a + b; return z; } --cut here-- using "-O2 -mincoming-stack-boundary=3" to: 0000000000000000 <test>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 0f 58 c8 addps %xmm0,%xmm1 8: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp c: 41 ff 72 f8 pushq -0x8(%r10) 10: 55 push %rbp 11: 48 89 e5 mov %rsp,%rbp 14: 41 52 push %r10 16: 0f 29 4d e0 movaps %xmm1,-0x20(%rbp) 1a: 0f 28 45 e0 movaps -0x20(%rbp),%xmm0 1e: 41 5a pop %r10 20: 5d pop %rbp 21: 49 8d 62 f8 lea -0x8(%r10),%rsp 25: c3 retq The stack realignment code ensures ABI-compliant stack alignment in all functions - and is kind of punishment for rogue application.