[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls (inc. preserve_none for x86_64?)

kenjin4096 at gmail dot com via Gcc-bugs Tue, 15 Apr 2025 00:38:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628


--- Comment #9 from Ken Jin <kenjin4096 at gmail dot com> ---
I tried this out with CPython's interpreter that uses tail calls with the patch
at https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads
applied.

I get a roughly 10% speedup on the pystones benchmark:

Without preserve_none
This machine benchmarks at 912722 pystones/second

With preserve_none
This machine benchmarks at 1.02601e+06 pystones/second

(Higher is better).

I noticed it's still about 10% slower than clang-20 though. It's shuffling
registers a lot at calls to external functions compared to Clang. Please see
https://github.com/llvm/llvm-project/pull/88333. On GCC I get this with the
patch applied:

a.out:  file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <entry>:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: 48 89 fb                      movq    %rdi, %rbx
       7: 49 89 f4                      movq    %rsi, %r12
       a: 49 89 d5                      movq    %rdx, %r13
       d: 49 89 ce                      movq    %rcx, %r14
      10: e8 00 00 00 00                callq   0x15 <entry+0x15>
      15: 4c 89 f1                      movq    %r14, %rcx
      18: 4c 89 ea                      movq    %r13, %rdx
      1b: 4c 89 e6                      movq    %r12, %rsi
      1e: 48 89 df                      movq    %rbx, %rdi
      21: 5d                            popq    %rbp
      22: e9 00 00 00 00                jmp     0x27 <entry+0x27>

[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls (inc. preserve_none for x86_64?)

Reply via email to