https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #9 from Ken Jin <kenjin4096 at gmail dot com> --- I tried this out with CPython's interpreter that uses tail calls with the patch at https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads applied. I get a roughly 10% speedup on the pystones benchmark: Without preserve_none This machine benchmarks at 912722 pystones/second With preserve_none This machine benchmarks at 1.02601e+06 pystones/second (Higher is better). I noticed it's still about 10% slower than clang-20 though. It's shuffling registers a lot at calls to external functions compared to Clang. Please see https://github.com/llvm/llvm-project/pull/88333. On GCC I get this with the patch applied: a.out: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <entry>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: 48 89 fb movq %rdi, %rbx 7: 49 89 f4 movq %rsi, %r12 a: 49 89 d5 movq %rdx, %r13 d: 49 89 ce movq %rcx, %r14 10: e8 00 00 00 00 callq 0x15 <entry+0x15> 15: 4c 89 f1 movq %r14, %rcx 18: 4c 89 ea movq %r13, %rdx 1b: 4c 89 e6 movq %r12, %rsi 1e: 48 89 df movq %rbx, %rdi 21: 5d popq %rbp 22: e9 00 00 00 00 jmp 0x27 <entry+0x27>