[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls (inc. preserve_none for x86_64?)

2025-04-16 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628 --- Comment #15 from Ken Jin --- I tested again this time with taskset, turbo boost off, on a quiet system, with PGO. These are the results. They're quite good: # Indirect goto + LTO + PGO This machine benchmarks at 576728 pystones/second # Ta

[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls (inc. preserve_none for x86_64?)

2025-04-15 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628 --- Comment #14 from Ken Jin --- No speedup (within noise) with latest patch over previous patch. So Andrew might be right there on the register shuffling. However, note that pystones is just one benchmark in Python and not the full benchmark su

[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls (inc. preserve_none for x86_64?)

2025-04-15 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628 --- Comment #9 from Ken Jin --- I tried this out with CPython's interpreter that uses tail calls with the patch at https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads applied. I get a roughly 10% speedup on the pystones

[Bug target/118328] Implement preserve_none for AArch64

2025-04-05 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 --- Comment #21 from Ken Jin --- I sincerely apologize for my previous performance figures. The baseline was worse due to a Clang-19 bug https://github.com/llvm/llvm-project/issues/106846. So the numbers were inaccurate. On Clang-20, on the pys

[Bug gcov-profile/118442] -fprofile-generate wrongly adds instrumentation after musttail call

2025-04-05 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442 --- Comment #10 from Ken Jin --- Wow, I tried a patched version of CPython and it now builds with musttail and PGO. Massive thanks to all the GCC contributors that worked towards this! I'm always in awe at how complex software like GCC work.

[Bug target/118328] Implement preserve_none for AArch64

2025-02-07 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 --- Comment #20 from Ken Jin --- (In reply to Andrew Pinski from comment #17) > I am not sure if I understand this correctly. > Can you make a simple table: > > w/o tail-call - 1 > with tail-call but not preserve_none -

[Bug target/118328] Implement preserve_none for AArch64

2025-01-13 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 --- Comment #7 from Ken Jin --- The files are too big to upload here, so I've uploaded them to https://github.com/Fidget-Spinner/debugging-dump. They correspond to the main interpreter loop of CPython https://github.com/python/cpython/blob/e1988

[Bug target/118328] Implement preserve_none for AArch64

2025-01-13 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 --- Comment #5 from Ken Jin --- However, it seems to me that there's still extraneous push and pops for function prologue/epilogue that could be removed with preserve_none. GCC's regalloc is definitely a lot better than Clang when both don't hav

[Bug tree-optimization/118442] New: -fprofile-generate wrongly adds instrumentation after musttail call

2025-01-12 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442 Bug ID: 118442 Summary: -fprofile-generate wrongly adds instrumentation after musttail call Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal

[Bug target/118328] Implement preserve_none for AArch64

2025-01-12 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 --- Comment #4 from Ken Jin --- I can confirm that in the case of tail calls, GCC does produce better/equivalent register spilling code than clang 19.1.0, by manual inspection of call sites.

[Bug tree-optimization/118431] [Feature request]: warn about escaped local variables in musttail instead of error-ing

2025-01-12 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118431 Ken Jin changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED

[Bug c/118431] New: [Feature request]: warn about escaped local variables in musttail instead of error-ing

2025-01-12 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118431 Bug ID: 118431 Summary: [Feature request]: warn about escaped local variables in musttail instead of error-ing Product: gcc Version: 15.0 Status: UNCONFIRMED S

[Bug tree-optimization/118430] [14/15 Regression] tail call vs IPA-VRP return value range with constant value

2025-01-12 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430 --- Comment #8 from Ken Jin --- Thanks a lot for your help on this! I think I've narrowed down what's happening with CPython. It seems what's happening in CPython is not a bug, but should be a feature request. I will file a separate feature requ

[Bug middle-end/118430] musttail false positive on how locals are used

2025-01-11 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430 --- Comment #2 from Ken Jin --- Thanks for the quick response. Is the second case in the initial post on the return slot error expected?

[Bug c/118430] New: musttail false positive on how locals are used

2025-01-11 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430 Bug ID: 118430 Summary: musttail false positive on how locals are used Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c

[Bug target/118328] Implement preserve_none for AArch64

2025-01-08 Thread kenjin4096 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118328 Ken Jin changed: What|Removed |Added CC||kenjin4096 at gmail dot com --- Comment #3 fr