https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #21 from Ken Jin <kenjin4096 at gmail dot com> --- I sincerely apologize for my previous performance figures. The baseline was worse due to a Clang-19 bug https://github.com/llvm/llvm-project/issues/106846. So the numbers were inaccurate. On Clang-20, on the pystones (Dhrystone variant) benchmark, I get a roughly 3% speedup with tailcalling interpreter versus computed goto. I have some numbers to report for CPython compilation time as well. These are with dynamic frequency scaling off: CC=clang-20 ./configure --with-lto=thin && make clean && time make -j18 + Tail call: real 1m8.183s - Tail call: real 1m11.004s CC=clang-20 ./configure --with-lto=full && make clean && time make -j18 + Tail call: real 3m49.285s - Tail call: real 3m59.679s CC=/home/ken/GCC-15.0-trunk/bin/gcc ./configure --with-lto=full && make clean && time make -j18 + Tail call: real 10m5.521s - Tail call: real 10m14.972s So we save roughly 4-5% compilation time by switching the interpreter from a over-10000-line switch case of computed gotos to smaller per-bytecode tail calls handlers on Clang 20. The savings on GCC 15 are lower (around 1%). I have no clue how this 4-5% translates to GCC 15, as the comparison between clang and gcc here is not apples-to-apples. The clang-20 on my system is a release distribution, while my GCC 15 is built from source just with configure and make. Anyways, I don't mean to push for anything here. Just updating the record and providing new numbers. Thanks again GCC devs for all your work on GCC!