https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> --- I updated hjl/interrupt/calls branch to update stack boundary for use of SSE registers. Now I got [hjl@gnu-tools-1 interrupt-1]$ cat x.i void __attribute__((no_caller_saved_registers)) fn (void) { asm ("#" : : : "xmm3"); } [hjl@gnu-tools-1 interrupt-1]$ make x.s /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -m32 -mincoming-stack-boundary=2 -maccumulate-outgoing-args -S x.i [hjl@gnu-tools-1 interrupt-1]$ cat x.s .file "x.i" .text .p2align 4,,15 .globl fn .type fn, @function fn: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 subl $136, %esp andl $-16, %esp movups %xmm3, -136(%ebp) .cfi_offset 24, -144 #APP # 5 "x.i" 1 # # 0 "" 2 #NO_APP movups -136(%ebp), %xmm3 leave .cfi_restore 5 .cfi_restore 24 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE0: .size fn, .-fn .ident "GCC: (GNU) 6.0.0 20151002 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-tools-1 interrupt-1]$ But is incompatible how registers are saved in prologue which uses the original stack before realignment to save registers. To use aligned load/store to save vector registers in prologue requires significant changes.