https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #15 from Ken Jin <kenjin4096 at gmail dot com> --- I tested again this time with taskset, turbo boost off, on a quiet system, with PGO. These are the results. They're quite good: # Indirect goto + LTO + PGO This machine benchmarks at 576728 pystones/second # Tail calls, no preserve_none + LTO + PGO* This machine benchmarks at 539522 pystones/second # Tail calls, preserve_none + LTO + PGO* This machine benchmarks at 572234 pystones/second So roughly a 6-7% gain from preserve_none on the pystones benchmark over no preserve_none. Thanks again H.J. for the patch. *PGO is disabled for tail calling functions in the bytecode interpreter, but enabled for everything else, as it seems PGO slows down those functions. I used the attributes `no_instrument_function,no_profile_instrument_function` to turn it off for the bytecode functions. Something strange is going on with PGO for tail calls on my system. However, I can't figure it out right now. Everything is benchmarked on this branch https://github.com/Fidget-Spinner/cpython/pull/new/Fidget-Spinner:cpython:tail-call-gcc-3