https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112470
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The instruction increase is 2:
sub sp, sp, #128
...
stp x29, x30, [sp, 112]
vs:
stp x29, x30, [sp, -128]!
and
ldp x29, x30, [sp, 112]
...
add sp, sp, 128
vs:
ldp x29, x30, [sp], 128
Depending on the core, the performance might be the same.
Without a full performance testcase which shows the difference, it is hard to
tell if this is an issue overall or just some one that shows up in one small
testcase.