[Bug gcov-profile/121123] [16 regression] some gcc.misc-tests/gcov-*.c fail starting with r16-2197-g385d9937f0e23c

2025-07-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121123 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug bootstrap/121038] autoprofiledbootstrap is broken in few ways

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038 --- Comment #2 from Jan Hubicka --- I experimented with smaller sampling period and indeed create_gcov then runs out of memory. On my setup create_gcov was simply segfaulting and produced just partial profile. Since Makefile does not fail on cr

[Bug debug/121093] New: Missed location of inlined function

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- In this testcase static int p1(int a) { return a+1; } static int p2(int a) { return a+2; } int p3 (int a) { return p1(p2(a)); } We optimize the two additions

[Bug gcov-profile/121074] [16 Regression] ICE: in gcov_open, at gcov-io.cc:128 with -ftest-coverage -fauto-profile

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|1 Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org Last reconfirmed||2025-07-15 --- Comment #1 from Jan Hubicka --- This is auto-profile never closing its input file. It also does not check for any read failures

[Bug bootstrap/121038] New: autoprofiledbootstrap is broken in few ways

2025-07-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Lets track the problems here. Currently 1) autoprofiledbootstrap fails for me at 256 core machine since perf runs out of memory Workaround is: diff --git a/Makefile.tpl

[Bug tree-optimization/119876] suboptimal code for avx512 conditional move

2025-07-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 --- Comment #6 from Jan Hubicka --- Aha, I was looking into scalar-to-vector improvements promoting scalar integer + 1 to vector on AMD CPUs.

[Bug tree-optimization/119876] suboptimal code for avx512 conditional move

2025-07-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 --- Comment #5 from Jan Hubicka --- I think I made the testcase while working on something else that I forgot, sorry :)

[Bug gcov-profile/120229] [GCOV] AutoFDO cannot distinguish privatized functions within an LTO partition

2025-07-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229 --- Comment #2 from Jan Hubicka --- See thread https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689018.html

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #9 from Jan Hubicka --- Created attachment 61818 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61818&action=edit create_gcov path

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #8 from Jan Hubicka --- Patching create_gcov to account all of debug statements associated with a given address instead of just the last one gets me: test total:4350509 head:8642 1: 4484 // { 2: 4484 // for ( 3: 4484

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #3 from Jan Hubicka --- There is also 3% performance regressions that got lost on transition to ne PR https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #2 from Jan Hubicka --- This is likely ipa-cp heuristics issue which decides to clone now but after all the benefits are not really visible.

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #7 from Jan Hubicka --- LLVM also gets execution counts wrong, just the different (and less harmful) way: test:270773509:9780 1: 9116 2: 51984 for ( 4: 51984 iThis Inner Loop Header: Depth=1 .loc0 10 15

[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug 120867 depends on bug 104457, which changed state. Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457 What|Remo

[Bug ipa/104457] ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|--- |FIXED CC||hubicka at gcc dot gnu.org --- Comment #3 from Jan Hubicka --- I believe update_specialized_profile should now be safe WRT ICE on contradicting profiles. I can build SPEC on x86 reliably (and we now run daily testing at LNT

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #10 from Jan Hubicka --- https://github.com/google/autofdo/issues/248

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug 120867 depends on bug 120938, which changed state. Bug 120938 Summary: discriminators are not useful in statements doing multiple calls https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 What|Removed |Add

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #8 from Jan Hubicka --- Porlbem goes away with diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index d1a55dbcbcb..52ca189531e 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -25012,9 +25012,8 @@ add_call_src_coords_attribute

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #7 from Jan Hubicka --- Looking at the diff there seems to few changes: - # d.C:16:2 - .loc 1 16 2 is_stmt 1 view .LVU16 + # d.C:15:8 + .loc 1 15 8 is_stmt 1 discriminator 1 view .LVU16 This is a line table

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #6 from Jan Hubicka --- Created attachment 61795 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61795&action=edit Diff

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #5 from Jan Hubicka --- Created attachment 61794 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61794&action=edit bad assembly

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #4 from Jan Hubicka --- Created attachment 61793 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61793&action=edit good assembly

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #3 from Jan Hubicka --- Even smaller set of example. Bad profile: #include volatile int variablev; static void inc() { variablev++; } static int zero = 0; int main () { for (int i = 0; i < 1; i++)

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #2 from Jan Hubicka --- This is even smaller testcase #include volatile int variablev; static void inc(int a) { variablev++; } inline int inline_me (int l) { for (int i = 0; i < 1; i++) {inc(1);inc(

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #1 from Jan Hubicka --- Removing the parameter of inc makes the problem to go away. So does removing the recursion #include volatile int variablev; static int dead () { return 0; } static void inc() { variablev++; }

[Bug debug/120938] New: discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@shroud:~> cat d.C #include volatile int a; static int dead () { return 0; } static void inc(int b) {

[Bug tree-optimization/120916] debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #3 from Jan Hubicka --- Well, PR32445 is about us not being able to vartrack value of I. I think that may be fixed since then by adding corresponding debug binds. However here we are missing info about statement being executed...

[Bug driver/120916] debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #1 from Jan Hubicka --- Here is variant for gcov tool: jh@shroud:/tmp> cat tt.c int s = 1023; int a[1024]; __attribute__ ((weak)) void test() { for ( int i = 0; /* Line 7, relative 3 */ i < s;

[Bug driver/120916] New: debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jan@padlo:/tmp> cat t.c int s = 1023; int a[1024]; __attribute__ ((noipa)) int test() { for ( int i = 0; /* Line 7 */ i < s; /*

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #15 from Jan Hubicka --- https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=68430&plot.0=1370.377.0&plot.1=1288.377.0 compares AFDO to no profile feedback

[Bug lto/66229] LTO fails with -fauto-profile on mcf

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|--- |FIXED CC||hubicka at gcc dot gnu.org --- Comment #5 from Jan Hubicka --- Lets say it is fixed. Mcf builds for me now.

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #11 from Jan Hubicka --- *** Bug 86404 has been marked as a duplicate of this bug. ***

[Bug testsuite/86404] UNRESOLVED/UNSUPPORTED gcov test results due to Permission error mapping pages

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
||hubicka at gcc dot gnu.org Status|NEW |WAITING --- Comment #10 from Jan Hubicka --- For me parallel check is quite good. I get 3 failures in peeling testcases that probably should be disable for AutoFDO Referenced Bugs: https://gcc.gnu.org

[Bug gcov-profile/120229] [GCOV] AutoFDO cannot distinguish privatized functions within an LTO partition

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|1 Last reconfirmed||2025-06-29 Status|UNCONFIRMED |NEW CC||hubicka at gcc dot gnu.org --- Comment #1 from Jan Hubicka --- confirmed Referenced Bugs: https://gcc.gnu.org

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-06-29 Blocks|

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/120867] New: [metabug] AutoFDO issues

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: ---

[Bug tree-optimization/120752] 5% slowdown of 525.x264_r since r16-1346-gb0d50cbb42ab2c

2025-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120752 --- Comment #4 from Jan Hubicka --- Hmm, there seems to be no big differences in IPA decisions between the runs, so further investigation is necessary :( The patch attempts to preserve more of profile and here profile is bit counter-productive

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #9 from Jan Hubicka --- I am happy it helps. I wonder if you can share details of your SPEC config. I.e. how you call perf (do you specify count etc) and how you handle merging of profiles. We now have regular tester (on AMD hardwa

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #6 from Jan Hubicka --- Also BTW, I think it is useful to do the dumps wth -details-blocks since that also dumps BB count inconsistencies caused by AutoFDO that are otherwise hard to spot. In ipa-cp dump it should be visible if cons

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #5 from Jan Hubicka --- Note that on x86-64 I get OK scores on x264. This compares no-FDO -Ofast -flto -march=native to autoFDO. I hacked the scripts to use ref run for training so it is longer: 500.perlbench_r 1158

[Bug target/119298] [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 119298, which changed state. Bug 119298 Summary: [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f https://gcc.g

[Bug target/120218] [16 Regression] 8% slowdown of 507.cactuBSSN_r on Intel

2025-05-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120218 --- Comment #2 from Jan Hubicka --- I guess for costing changes, too. Since this is a weekly tester, bisecting would help.

[Bug tree-optimization/120219] [16 Regression] ~11% slowdown of 548.exchange2_r on x86_64 (maybe also on aarch64?) since r16-448-g8335fd561fa823

2025-05-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #5 from Jan Hubicka -

[Bug target/120226] New: 8% regression of exchange2 with -O2 between g:d0571638a6bad932 and g:9b13bea07706a7ca

2025-05-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is visible on both Zen and Intel testers https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0

[Bug ipa/120099] [16 regression] gfortran.dg/specifics_1.f90 FAILs since r16-372-g064cac730f88dc

2025-05-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120099 --- Comment #4 from Jan Hubicka --- This patch enables more inlining, so I guess it is previously latent problem triggered by inliner...

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #9 from Jan Hubicka --- Forgot to say, -fno-optimize-sibbling-calls re-enables the cloning & inline.

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #8 from Jan Hubicka --- The difference is that tailr1 pass now turns recursion into loop. GCC15 does: Basic block 11 has extra exit edges Basic block 33 has extra exit edges Basic block 28 has extra exit edges Basic block 23 has ex

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-06 Ever confirmed|0

[Bug tree-optimization/120069] [16 Regression] Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df126

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-03 Ever confirmed|0

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #6 from Jan Hubicka --- Sadly this did not fix the whole regression. The problem is that after my change to enable ipa-cp to clone over cold edges we clone GetVirtualPixelsFromNexus twice (as constprop.0 and constprop.1). This func

[Bug target/120069] New: Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df1266153b438

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
:55b01e17c793688a2878fa43a76df1266153b438 Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone

[Bug tree-optimization/120065] [14/15/16 Regression] profile info corrupted by dom2

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120065 --- Comment #3 from Jan Hubicka --- while (n > 0 && a) ; This is an odd loop which loops iterates 0 times or infinitely many times. We do not pattern match that at profile-estimate time (since such code is kind of useless) and we guess i

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-04-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 Jan Hubicka changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org S

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #3 from Jan Hubicka --- Reverting the change of size_costs solves the regression, so it is about differences in optimization of cold code. I will try to track down what causes that.

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #2 from Jan Hubicka --- aha, I mistakely added analysis to PR105275. One problem I noticed was wrong costing of FP scalar min/max which is fixed now but does not affect imgick. Interesting is that we now vectorized same loops and BBs

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #5 from Jan Hubicka --- This is MorphologyApply MagickExport Image *MorphologyApply(const Image *image, const ChannelType channel,const MorphologyMethod method, const ssize_t iterations, const KernelInfo *kernel, const Com

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #4 from Jan Hubicka --- With -fprofile-use we get Evaluating opportunities for MorphologyApply/3266. - considering value 134217719 for param #1 const ChannelType (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 427

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #9 from Jan Hubicka --- The only vectorization difference is: +imagick_r.ltrans8.ltrans.189t.slp1:magick/distort.c:1911:18: optimized: basic block part vectorized using 16 byte vectors +imagick_r.ltrans8.ltrans.189t.slp1:magick/dist

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/119924] [16 Regression] ICE when building 531.deepsjeng_r during ipa-cp since r16-101-g132d01d96ea9d6

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119924 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #6 from Jan Hubicka --- Exchange2 regression is solved and tonto seem to be noise (performance is back today w/o change of a checksum of the text segment). still we account one extra setcc and misaccount scatter, so lets keep this t

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #3 from Jan Hubicka -

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|ASSIGNED Last reconfirmed||2025-04-24 Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #2 from Jan Hubicka --- This is with -O2 only. Difference is +++ bbb 2025-04-24 16:21:25.029155295 +0200 @@ -108,10 +108,7

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #5 from Jan Hubicka --- as g:132d01d96ea9d617aaffdd5dfba3284a8958e529 I have committed the patch that enables ipa-cp to clone over edges which are !maybe_hot_p(). This improves x264 with FDO by 7.8% and exchange by 3.3% It causes qu

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #1 from Jan Hubicka --- There is also 4% tonto regression in Intel in the same range it seems https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0

[Bug target/119919] New: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- this reproduces both on Zen and Intel: https

[Bug tree-optimization/119902] New: open-coded scatter/gather should not account vec_to_scalar cost

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html in loop > void foo (int n, int *

[Bug target/119900] New: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- this seems to reproduce on Intel (119%) https://lnt.opensuse.org

[Bug target/119879] [16 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c since r16-39

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #2 from Jan Hubicka --- Created attachment 61166 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit Fix I am testing The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with vec_promote_dem

[Bug target/119879] [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #1 from Jan Hubicka --- The problem is in: /* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end up doing two conversions and packing them. */ if (!scalar_p && inner_size > outer_size) { i

[Bug target/119876] New: suboptimal code for avx512 conditinal move

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- int a[1000]; int b[1000]; int c[1000]; int d[1000]; void test() { for (int i = 0; i < 1000; i++) a[i] = b[i] > 0 ? c[i] + 1 : c[i] + 2;

[Bug tree-optimization/119875] New: loop with floating point conditional move not vectorized without -ffast-math

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- double a[1000]; double b[1000]; double c[1000]; double d[1000]; void test() { for (int i = 0; i

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #47 from Jan Hubicka --- Created attachment 61134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit patch w/o forgotten debug output

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #46 from Jan Hubicka --- Created attachment 61133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit updated patch The problem in previous patch was that ipa-prop streams 0 to the end of block of summary section

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #44 from Jan Hubicka --- Summaries are duplicated when clone is created. Let me debug why it gets lost here.

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #37 from Jan Hubicka --- Created attachment 61128 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit updated patch (regtests and bootstraps) Updated patch. Streaming summaries seems to work and fixes the testcase

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #36 from Jan Hubicka --- Created attachment 61127 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit patch (untested)

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #34 from Jan Hubicka --- I there is only problem that ipa_return_value_sum value sum does not survive from compile time to WPA then we only need to add streaming code for it. This should be straightforward and there is no need to add

[Bug target/105275] [12/13/14/15 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #6 from Jan Hubicka --- as discussed in PR111551 the SPEC train run does not include hottest loop of imagick (in ref loop), so we optimize it for size (in particular disable vectorization) and get poor performance

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #7 from Jan Hubicka --- Details are in PR111551

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #6 from Jan Hubicka --- The problem is that the internal loop in hottest function changes between train and ref run (train run uses different variant of the loop). This disables vectorization of the loop believed to be cold causing -

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #15 from Jan Hubicka --- I made sily stand-alone test: long test[4]; __attribute__ ((noipa)) void foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d) { test[0]=a; test[1]=b; test[2]=c;

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #14 from Jan Hubicka --- > > I am OK with using addss cost of 3 for trunk&release branches and make this > > more precise next stage1. > > That's what we use now? But I still don't understand why exactly > 538.imagick_r regresses

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #12 from Jan Hubicka --- > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP > difference. Yep, I know. With that patch I mostly wanted to limit redundancy of the tables. The int/Fp difference was mostly based

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #7 from Jan Hubicka --- Hmm, the sequence does not use + at all, but I think I know what is going on. While the field is called addss it is used as an kitchen sink for all other simple operations. /* pmuludq under sse2, pmuld

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #4 from Jan Hubicka --- Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto -Ofast -march=native + PGO (peak) on znver3 Estimated Estimated Base

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #3 from Jan Hubicka --- With speculation_useful_p we now are able to constant propagate stride into mc_chroma with PGO, but it does not help runtime. https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html solves the costi

[Bug libstdc++/119606] [15 regression] Commit 'Optimize string constructor' causes regression in Snappy workload for -mcpu=neoverse-v2 with LTO

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug target/119565] New: 13-17% regression of botan CAS128 and DES on zen4

2025-04-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is visible on: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.676.1 https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.675.1 https

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #5 from Jan Hubicka --- Thinking of it more, I think enabling memory alternatives in (define_insn "sse4_1_v4hiv4si2" [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") (any_extend:V4SI (vec_select:V4HI (m

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #2 from Jan Hubicka --- On this combiner fails to match: Failed to match this instruction: (set (subreg:V4SI (reg:V2DI 101 [ ]) 0) (sign_extend:V4SI (vec_select:V4HI (mem:V8HI (reg:DI 106) [0 *x_3(D)+0 S16 A128]) (p

[Bug target/119368] New: immintrin code running slower with gcc than clang

2025-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- as mentioned in https://www.root.cz/clanky/instrukcni-sady-simd-a-automaticke-vektorizace-provadene-prekladacem-gcc/nazory/#newIndex1 the following code runs faster

[Bug ipa/119312] Constant array not allocated in read-only segment

2025-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312 --- Comment #13 from Jan Hubicka --- And forgot to write. In case of strcmp I think we can use fnspec info we already have at the time constructing callgraph to represent it as a read rather than taking address. This would make things go bit sm

  1   2   3   4   5   6   7   8   9   10   >