https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121123
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038
--- Comment #2 from Jan Hubicka ---
I experimented with smaller sampling period and indeed create_gcov then runs
out of memory. On my setup create_gcov was simply segfaulting and produced just
partial profile. Since Makefile does not fail on cr
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
In this testcase
static int p1(int a)
{
return a+1;
}
static int p2(int a)
{
return a+2;
}
int p3 (int a)
{
return p1(p2(a));
}
We optimize the two additions
|1
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
Last reconfirmed||2025-07-15
--- Comment #1 from Jan Hubicka ---
This is auto-profile never closing its input file. It also does not check for
any read failures
: bootstrap
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Lets track the problems here. Currently
1) autoprofiledbootstrap fails for me at 256 core machine since perf runs out
of memory
Workaround is:
diff --git a/Makefile.tpl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
--- Comment #6 from Jan Hubicka ---
Aha, I was looking into scalar-to-vector improvements promoting scalar integer
+ 1 to vector on AMD CPUs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
--- Comment #5 from Jan Hubicka ---
I think I made the testcase while working on something else that I forgot,
sorry :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229
--- Comment #2 from Jan Hubicka ---
See thread
https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689018.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #9 from Jan Hubicka ---
Created attachment 61818
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61818&action=edit
create_gcov path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #8 from Jan Hubicka ---
Patching create_gcov to account all of debug statements associated with a given
address instead of just the last one gets me:
test total:4350509 head:8642
1: 4484 // {
2: 4484 // for (
3: 4484
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965
--- Comment #3 from Jan Hubicka ---
There is also 3% performance regressions that got lost on transition to ne PR
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965
--- Comment #2 from Jan Hubicka ---
This is likely ipa-cp heuristics issue which decides to clone now but after all
the benefits are not really visible.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #7 from Jan Hubicka ---
LLVM also gets execution counts wrong, just the different (and less harmful)
way:
test:270773509:9780
1: 9116
2: 51984 for (
4: 51984 iThis Inner Loop Header: Depth=1
.loc0 10 15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug 120867 depends on bug 104457, which changed state.
Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in
update_specialized_profile, at ipa-cp.c:4422
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457
What|Remo
|--- |FIXED
CC||hubicka at gcc dot gnu.org
--- Comment #3 from Jan Hubicka ---
I believe update_specialized_profile should now be safe WRT ICE on
contradicting profiles. I can build SPEC on x86 reliably (and we now run daily
testing at LNT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #10 from Jan Hubicka ---
https://github.com/google/autofdo/issues/248
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug 120867 depends on bug 120938, which changed state.
Bug 120938 Summary: discriminators are not useful in statements doing multiple
calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
What|Removed |Add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #8 from Jan Hubicka ---
Porlbem goes away with
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d1a55dbcbcb..52ca189531e 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -25012,9 +25012,8 @@ add_call_src_coords_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #7 from Jan Hubicka ---
Looking at the diff there seems to few changes:
- # d.C:16:2
- .loc 1 16 2 is_stmt 1 view .LVU16
+ # d.C:15:8
+ .loc 1 15 8 is_stmt 1 discriminator 1 view .LVU16
This is a line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #6 from Jan Hubicka ---
Created attachment 61795
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61795&action=edit
Diff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #5 from Jan Hubicka ---
Created attachment 61794
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61794&action=edit
bad assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #4 from Jan Hubicka ---
Created attachment 61793
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61793&action=edit
good assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #3 from Jan Hubicka ---
Even smaller set of example. Bad profile:
#include
volatile int variablev;
static void inc()
{
variablev++;
}
static int zero = 0;
int main ()
{
for (int i = 0; i < 1; i++)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #2 from Jan Hubicka ---
This is even smaller testcase
#include
volatile int variablev;
static void inc(int a)
{
variablev++;
}
inline int
inline_me (int l)
{
for (int i = 0; i < 1; i++)
{inc(1);inc(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #1 from Jan Hubicka ---
Removing the parameter of inc makes the problem to go away. So does removing
the recursion
#include
volatile int variablev;
static int dead ()
{
return 0;
}
static void inc()
{
variablev++;
}
Priority: P3
Component: debug
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@shroud:~> cat d.C
#include
volatile int a;
static int dead ()
{
return 0;
}
static void inc(int b)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #3 from Jan Hubicka ---
Well, PR32445 is about us not being able to vartrack value of I. I think that
may be fixed since then by adding corresponding debug binds.
However here we are missing info about statement being executed...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #1 from Jan Hubicka ---
Here is variant for gcov tool:
jh@shroud:/tmp> cat tt.c
int s = 1023;
int a[1024];
__attribute__ ((weak))
void test()
{
for (
int i = 0; /* Line 7, relative 3 */
i < s;
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jan@padlo:/tmp> cat t.c
int s = 1023;
int a[1024];
__attribute__ ((noipa))
int test()
{
for (
int i = 0; /* Line 7 */
i < s; /*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #15 from Jan Hubicka ---
https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=68430&plot.0=1370.377.0&plot.1=1288.377.0
compares AFDO to no profile feedback
|--- |FIXED
CC||hubicka at gcc dot gnu.org
--- Comment #5 from Jan Hubicka ---
Lets say it is fixed. Mcf builds for me now.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684
--- Comment #11 from Jan Hubicka ---
*** Bug 86404 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
||hubicka at gcc dot gnu.org
Status|NEW |WAITING
--- Comment #10 from Jan Hubicka ---
For me parallel check is quite good. I get 3 failures in peeling testcases
that probably should be disable for AutoFDO
Referenced Bugs:
https://gcc.gnu.org
|1
Last reconfirmed||2025-06-29
Status|UNCONFIRMED |NEW
CC||hubicka at gcc dot gnu.org
--- Comment #1 from Jan Hubicka ---
confirmed
Referenced Bugs:
https://gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-06-29
Blocks|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120752
--- Comment #4 from Jan Hubicka ---
Hmm,
there seems to be no big differences in IPA decisions between the runs, so
further investigation is necessary :(
The patch attempts to preserve more of profile and here profile is bit
counter-productive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551
--- Comment #9 from Jan Hubicka ---
I am happy it helps. I wonder if you can share details of your SPEC config.
I.e. how you call perf (do you specify count etc) and how you handle merging of
profiles.
We now have regular tester (on AMD hardwa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #6 from Jan Hubicka ---
Also BTW, I think it is useful to do the dumps wth -details-blocks since that
also dumps BB count inconsistencies caused by AutoFDO that are otherwise hard
to spot.
In ipa-cp dump it should be visible if cons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #5 from Jan Hubicka ---
Note that on x86-64 I get OK scores on x264. This compares no-FDO -Ofast -flto
-march=native to autoFDO. I hacked the scripts to use ref run for training so
it is longer:
500.perlbench_r 1158
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 119298, which changed state.
Bug 119298 Summary: [15/16 Regression] 538.imagick_r is faster when compiled
with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since
r15-3441-g4292297a0f938f
https://gcc.g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120218
--- Comment #2 from Jan Hubicka ---
I guess for costing changes, too. Since this is a weekly tester, bisecting
would help.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219
Jan Hubicka changed:
What|Removed |Added
Depends on||119902
--- Comment #5 from Jan Hubicka -
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This is visible on both Zen and Intel testers
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120099
--- Comment #4 from Jan Hubicka ---
This patch enables more inlining, so I guess it is previously latent problem
triggered by inliner...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
--- Comment #9 from Jan Hubicka ---
Forgot to say, -fno-optimize-sibbling-calls re-enables the cloning & inline.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
--- Comment #8 from Jan Hubicka ---
The difference is that tailr1 pass now turns recursion into loop.
GCC15 does:
Basic block 11 has extra exit edges
Basic block 33 has extra exit edges
Basic block 28 has extra exit edges
Basic block 23 has ex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-05-06
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-05-03
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #6 from Jan Hubicka ---
Sadly this did not fix the whole regression. The problem is that after my
change to enable ipa-cp to clone over cold edges we clone
GetVirtualPixelsFromNexus twice (as constprop.0 and constprop.1). This
func
:55b01e17c793688a2878fa43a76df1266153b438
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120065
--- Comment #3 from Jan Hubicka ---
while (n > 0 && a)
;
This is an odd loop which loops iterates 0 times or infinitely many times.
We do not pattern match that at profile-estimate time (since such code is kind
of useless) and we guess i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
Jan Hubicka changed:
What|Removed |Added
CC||rsandifo at gcc dot gnu.org
S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #3 from Jan Hubicka ---
Reverting the change of size_costs solves the regression, so it is about
differences in optimization of cold code. I will try to track down what causes
that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #2 from Jan Hubicka ---
aha, I mistakely added analysis to PR105275. One problem I noticed was wrong
costing of FP scalar min/max which is fixed now but does not affect imgick.
Interesting is that we now vectorized same loops and BBs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #5 from Jan Hubicka ---
This is MorphologyApply
MagickExport Image *MorphologyApply(const Image *image, const ChannelType
channel,const MorphologyMethod method, const ssize_t iterations,
const KernelInfo *kernel, const Com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #4 from Jan Hubicka ---
With -fprofile-use we get
Evaluating opportunities for MorphologyApply/3266.
- considering value 134217719 for param #1 const ChannelType (caller_count: 3)
good_cloning_opportunity_p (time: 1, size: 427
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
--- Comment #9 from Jan Hubicka ---
The only vectorization difference is:
+imagick_r.ltrans8.ltrans.189t.slp1:magick/distort.c:1911:18: optimized: basic
block part vectorized using 16 byte vectors
+imagick_r.ltrans8.ltrans.189t.slp1:magick/dist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119924
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
--- Comment #6 from Jan Hubicka ---
Exchange2 regression is solved and tonto seem to be noise (performance is back
today w/o change of a checksum of the text segment).
still we account one extra setcc and misaccount scatter, so lets keep this t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
Jan Hubicka changed:
What|Removed |Added
Depends on||119902
--- Comment #3 from Jan Hubicka -
|ASSIGNED
Last reconfirmed||2025-04-24
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #2 from Jan Hubicka ---
This is with -O2 only. Difference is
+++ bbb 2025-04-24 16:21:25.029155295 +0200
@@ -108,10 +108,7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #5 from Jan Hubicka ---
as g:132d01d96ea9d617aaffdd5dfba3284a8958e529 I have committed the patch that
enables ipa-cp to clone over edges which are !maybe_hot_p().
This improves x264 with FDO by 7.8% and exchange by 3.3%
It causes qu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
--- Comment #1 from Jan Hubicka ---
There is also 4% tonto regression in Intel in the same range it seems
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
this reproduces both on Zen and Intel:
https
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html
in loop
> void foo (int n, int *
: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
this seems to reproduce on Intel (119%)
https://lnt.opensuse.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #2 from Jan Hubicka ---
Created attachment 61166
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit
Fix I am testing
The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with
vec_promote_dem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #1 from Jan Hubicka ---
The problem is in:
/* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end
up doing two conversions and packing them. */
if (!scalar_p && inner_size > outer_size)
{
i
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
int a[1000];
int b[1000];
int c[1000];
int d[1000];
void test()
{
for (int i = 0; i < 1000; i++)
a[i] = b[i] > 0 ? c[i] + 1 : c[i] + 2;
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
double a[1000];
double b[1000];
double c[1000];
double d[1000];
void test()
{
for (int i = 0; i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #47 from Jan Hubicka ---
Created attachment 61134
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit
patch w/o forgotten debug output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #46 from Jan Hubicka ---
Created attachment 61133
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit
updated patch
The problem in previous patch was that ipa-prop streams 0 to the end of block
of summary section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #44 from Jan Hubicka ---
Summaries are duplicated when clone is created. Let me debug why it gets lost
here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #37 from Jan Hubicka ---
Created attachment 61128
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit
updated patch (regtests and bootstraps)
Updated patch. Streaming summaries seems to work and fixes the testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #36 from Jan Hubicka ---
Created attachment 61127
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit
patch (untested)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #34 from Jan Hubicka ---
I there is only problem that ipa_return_value_sum value sum does not survive
from compile time to WPA then we only need to add streaming code for it. This
should be straightforward and there is no need to add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
--- Comment #6 from Jan Hubicka ---
as discussed in PR111551 the SPEC train run does not include hottest loop of
imagick (in ref loop), so we optimize it for size (in particular disable
vectorization) and get poor performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646
--- Comment #7 from Jan Hubicka ---
Details are in PR111551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646
--- Comment #6 from Jan Hubicka ---
The problem is that the internal loop in hottest function changes between train
and ref run (train run uses different variant of the loop). This disables
vectorization of the loop believed to be cold causing -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #15 from Jan Hubicka ---
I made sily stand-alone test:
long test[4];
__attribute__ ((noipa))
void
foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d)
{
test[0]=a;
test[1]=b;
test[2]=c;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #14 from Jan Hubicka ---
> > I am OK with using addss cost of 3 for trunk&release branches and make this
> > more precise next stage1.
>
> That's what we use now? But I still don't understand why exactly
> 538.imagick_r regresses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #12 from Jan Hubicka ---
> Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP
> difference.
Yep, I know. With that patch I mostly wanted to limit redundancy of the
tables. The int/Fp difference was mostly based
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #7 from Jan Hubicka ---
Hmm, the sequence does not use + at all, but I think I know what is going on.
While the field is called addss it is used as an kitchen sink for all other
simple operations.
/* pmuludq under sse2, pmuld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #4 from Jan Hubicka ---
Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto
-Ofast -march=native + PGO (peak) on znver3
Estimated Estimated
Base
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #3 from Jan Hubicka ---
With speculation_useful_p we now are able to constant propagate stride into
mc_chroma with PGO, but it does not help runtime.
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html
solves the costi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This is visible on:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.676.1
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.675.1
https
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368
--- Comment #5 from Jan Hubicka ---
Thinking of it more, I think enabling memory alternatives in
(define_insn "sse4_1_v4hiv4si2"
[(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
(any_extend:V4SI
(vec_select:V4HI
(m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368
--- Comment #2 from Jan Hubicka ---
On this combiner fails to match:
Failed to match this instruction:
(set (subreg:V4SI (reg:V2DI 101 [ ]) 0)
(sign_extend:V4SI (vec_select:V4HI (mem:V8HI (reg:DI 106) [0 *x_3(D)+0 S16
A128])
(p
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
as mentioned in
https://www.root.cz/clanky/instrukcni-sady-simd-a-automaticke-vektorizace-provadene-prekladacem-gcc/nazory/#newIndex1
the following code runs faster
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312
--- Comment #13 from Jan Hubicka ---
And forgot to write. In case of strcmp I think we can use fnspec info we
already have at the time constructing callgraph to represent it as a read
rather than taking address. This would make things go bit sm
1 - 100 of 1621 matches
Mail list logo