https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #16 from Diego Russo <Diego.Russo at arm dot com> --- Right, I had a couple of problems with running the benchmarks. A few failures and the wrong environment variable to select the binary of the compiler. Anyway I re-ran the benchmarks and the binary without preserve_none is actually 6% slower than the build without tail-calling interpreter. If we are going to introduce the preserve_none attribute, the 6% is regained and it is 0% faster. Hence the preserve-none is needed otherwise we will have a regression. BTW Brandt (CPython Core developers) pointed me at this Github issued:https://github.com/llvm/llvm-project/pull/88333 where it tries to use non-volatile registers for preserve_none parameters. With that change we notice a significant speed-up whilst executing benchmarks. LLVM uses normally-non-volatile (x19-x28) first, then normally-volatile registers (x0-x15). I tried compiling that small example and what I have is: $ objdump -d boring boring: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <entry>: 0: a9bf7bfd stp x29, x30, [sp, #-16]! 4: 910003fd mov x29, sp 8: aa0003f3 mov x19, x0 c: aa0103f4 mov x20, x1 10: aa0203f5 mov x21, x2 14: aa0303f6 mov x22, x3 18: 94000000 bl 0 <boring> 1c: aa1603e3 mov x3, x22 20: aa1503e2 mov x2, x21 24: aa1403e1 mov x1, x20 28: aa1303e0 mov x0, x19 2c: a8c17bfd ldp x29, x30, [sp], #16 30: 14000000 b 0 <continuation> which differ from the second block of text on that PR. Can we have the same implementation/interface of LLVM? Thanks