https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114843
--- Comment #13 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #11) > I have a fix for aarch64, able to produce now: > ``` > f: > .LFB0: > .cfi_startproc > stp x0, x1, [sp, -32]! > .cfi_def_cfa_offset 32 > .cfi_offset 0, -32 > .cfi_offset 1, -24 > stp x2, x3, [sp, 16] > .cfi_offset 2, -16 > .cfi_offset 3, -8 > ldr w0, [x0] > cmp w0, 5 > bne .L8 > add sp, sp, 32 > .cfi_remember_state > .cfi_def_cfa_offset 0 > ret > .L8: > .cfi_restore_state > mov x5, x1 > ldp x2, x3, [sp, 16] > ldp x0, x1, [sp], 32 > .cfi_restore 1 > .cfi_restore 0 > .cfi_restore 2 > .cfi_restore 3 > .cfi_def_cfa_offset 0 > add sp, sp, x5 > ret > .cfi_endproc > ``` > > Which is exactly what we should produce I think. > The patch is a bit more complex than I expected but that is due to how > aarch64 has some of the most complex epilogues. I'm not convinced that is an easy solution. Try various cases with large stack sizes, alloca and other scalar and FP callee-saves. Getting all cases right and writing good tests for them is a lot of work.