https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328

--- Comment #16 from Diego Russo <Diego.Russo at arm dot com> ---
Right, I had a couple of problems with running the benchmarks. A few failures
and the wrong environment variable to select the binary of the compiler.

Anyway I re-ran the benchmarks and the binary without preserve_none is actually
6% slower than the build without tail-calling interpreter. If we are going to
introduce the preserve_none attribute, the 6% is regained and it is 0%  faster.
Hence the preserve-none is needed otherwise we will have a regression.

BTW Brandt (CPython Core developers) pointed me at this Github
issued:https://github.com/llvm/llvm-project/pull/88333 where it tries to use
non-volatile registers for preserve_none parameters. With that change we notice
a significant  speed-up whilst executing benchmarks.

LLVM uses normally-non-volatile (x19-x28) first, then normally-volatile
registers (x0-x15).

I tried compiling that small example and what I have is:

$ objdump -d boring

boring:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <entry>:
   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
   4:   910003fd        mov     x29, sp
   8:   aa0003f3        mov     x19, x0
   c:   aa0103f4        mov     x20, x1
  10:   aa0203f5        mov     x21, x2
  14:   aa0303f6        mov     x22, x3
  18:   94000000        bl      0 <boring>
  1c:   aa1603e3        mov     x3, x22
  20:   aa1503e2        mov     x2, x21
  24:   aa1403e1        mov     x1, x20
  28:   aa1303e0        mov     x0, x19
  2c:   a8c17bfd        ldp     x29, x30, [sp], #16
  30:   14000000        b       0 <continuation>

which differ from the second block of text on that PR.

Can we have the same implementation/interface  of LLVM?

Thanks

Reply via email to