[Bug target/118328] Implement preserve_none for AArch64

kenjin4096 at gmail dot com via Gcc-bugs Sat, 05 Apr 2025 16:17:37 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328


--- Comment #21 from Ken Jin <kenjin4096 at gmail dot com> ---
I sincerely apologize for my previous performance figures. The baseline was
worse due to a Clang-19 bug https://github.com/llvm/llvm-project/issues/106846.
So the numbers were inaccurate.

On Clang-20, on the pystones (Dhrystone variant) benchmark, I get a roughly 3%
speedup with tailcalling interpreter versus computed goto.

I have some numbers to report for CPython compilation time as well. These are
with dynamic frequency scaling off:

CC=clang-20 ./configure --with-lto=thin && make clean && time make -j18

+ Tail call:
real    1m8.183s

- Tail call:
real    1m11.004s

CC=clang-20 ./configure --with-lto=full && make clean && time make -j18

+ Tail call:
real    3m49.285s

- Tail call:
real    3m59.679s

CC=/home/ken/GCC-15.0-trunk/bin/gcc ./configure --with-lto=full && make clean
&& time make -j18

+ Tail call:
real    10m5.521s

- Tail call:
real    10m14.972s

So we save roughly 4-5% compilation time by switching the interpreter from a
over-10000-line switch case of computed gotos to smaller per-bytecode tail
calls handlers on Clang 20. The savings on GCC 15 are lower (around 1%).

I have no clue how this 4-5% translates to GCC 15, as the comparison between
clang and gcc here is not apples-to-apples. The clang-20 on my system is a
release distribution, while my GCC 15 is built from source just with configure
and make.

Anyways, I don't mean to push for anything here. Just updating the record and
providing new numbers. Thanks again GCC devs for all your work on GCC!

[Bug target/118328] Implement preserve_none for AArch64

Reply via email to