https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628

--- Comment #15 from Ken Jin <kenjin4096 at gmail dot com> ---
I tested again this time with taskset, turbo boost off, on a quiet system, with
PGO. These are the results. They're quite good:

# Indirect goto + LTO + PGO
This machine benchmarks at 576728 pystones/second

# Tail calls, no preserve_none + LTO + PGO*
This machine benchmarks at 539522 pystones/second

# Tail calls, preserve_none + LTO + PGO*
This machine benchmarks at 572234 pystones/second

So roughly a 6-7% gain from preserve_none on the pystones benchmark over no
preserve_none. Thanks again H.J. for the patch.

*PGO is disabled for tail calling functions in the bytecode interpreter, but
enabled for everything else, as it seems PGO slows down those functions. I used
the attributes `no_instrument_function,no_profile_instrument_function` to turn
it off for the bytecode functions.

Something strange is going on with PGO for tail calls on my system. However, I
can't figure it out right now.

Everything is benchmarked on this branch
https://github.com/Fidget-Spinner/cpython/pull/new/Fidget-Spinner:cpython:tail-call-gcc-3

Reply via email to