https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #15 from Ken Jin ---
I tested again this time with taskset, turbo boost off, on a quiet system, with
PGO. These are the results. They're quite good:
# Indirect goto + LTO + PGO
This machine benchmarks at 576728 pystones/second
# Ta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #14 from Ken Jin ---
No speedup (within noise) with latest patch over previous patch. So Andrew
might be right there on the register shuffling.
However, note that pystones is just one benchmark in Python and not the
full benchmark su
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #9 from Ken Jin ---
I tried this out with CPython's interpreter that uses tail calls with the patch
at https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads
applied.
I get a roughly 10% speedup on the pystones
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #21 from Ken Jin ---
I sincerely apologize for my previous performance figures. The baseline was
worse due to a Clang-19 bug https://github.com/llvm/llvm-project/issues/106846.
So the numbers were inaccurate.
On Clang-20, on the pys
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442
--- Comment #10 from Ken Jin ---
Wow, I tried a patched version of CPython and it now builds with musttail and
PGO. Massive thanks to all the GCC contributors that worked towards this! I'm
always in awe at how complex software like GCC work.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #20 from Ken Jin ---
(In reply to Andrew Pinski from comment #17)
> I am not sure if I understand this correctly.
> Can you make a simple table:
>
> w/o tail-call - 1
> with tail-call but not preserve_none -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #7 from Ken Jin ---
The files are too big to upload here, so I've uploaded them to
https://github.com/Fidget-Spinner/debugging-dump. They correspond to the main
interpreter loop of CPython
https://github.com/python/cpython/blob/e1988
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #5 from Ken Jin ---
However, it seems to me that there's still extraneous push and pops for
function prologue/epilogue that could be removed with preserve_none. GCC's
regalloc is definitely a lot better than Clang when both don't hav
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442
Bug ID: 118442
Summary: -fprofile-generate wrongly adds instrumentation after
musttail call
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
--- Comment #4 from Ken Jin ---
I can confirm that in the case of tail calls, GCC does produce
better/equivalent register spilling code than clang 19.1.0, by manual
inspection of call sites.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118431
Ken Jin changed:
What|Removed |Added
Resolution|--- |INVALID
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118431
Bug ID: 118431
Summary: [Feature request]: warn about escaped local variables
in musttail instead of error-ing
Product: gcc
Version: 15.0
Status: UNCONFIRMED
S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430
--- Comment #8 from Ken Jin ---
Thanks a lot for your help on this! I think I've narrowed down what's happening
with CPython. It seems what's happening in CPython is not a bug, but should be
a feature request. I will file a separate feature requ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430
--- Comment #2 from Ken Jin ---
Thanks for the quick response.
Is the second case in the initial post on the return slot error expected?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430
Bug ID: 118430
Summary: musttail false positive on how locals are used
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328
Ken Jin changed:
What|Removed |Added
CC||kenjin4096 at gmail dot com
--- Comment #3 fr
16 matches
Mail list logo