https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102720
--- Comment #7 from Jan Hubicka ---
simplified testcase is:
typedef unsigned char uint8_t;
typedef __SIZE_TYPE__ size_t;
extern void* malloc (size_t);
extern void* memset (void*, int, size_t);
#define test(T, U)\
__attribute__((noinline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102557
Jan Hubicka changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102720
--- Comment #8 from Jan Hubicka ---
so it is really pt_solutions_intersect in ref_maybe_used_by_call returning
false.
We get:
(gdb) p *pt1
$6 = {anything = 0, nonlocal = 1, escaped = 1, ipa_escaped = 0, null = 0,
vars_contains_nonlocal = 0, vars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102720
--- Comment #9 from Jan Hubicka ---
OK, with -alias dump we have:
int main ()
{
uint8_t * q;
void * p;
long unsigned int _2;
:
# PT = null { D.2008 }
# ALIGN = 8, MISALIGN = 0
# USE = anything
# CLB = anything
p_5 = malloc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102720
--- Comment #10 from Jan Hubicka ---
copied ealias dump rather than alis dump in previous comment.
alias dump is
int main ()
{
void * p;
long unsigned int _1;
[local count: 1073741824]:
# PT = null { D.2014 }
# ALIGN = 8, MISALIGN =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102720
--- Comment #11 from Jan Hubicka ---
Aha, the problem is in the way I updated computing use/clobber sets. I
accidentally disabled code that copies the solution from solver local
representation into the final form. As a result we failed to updat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102947
Bug ID: 102947
Summary: SPEC2006 compiler time regression (-Ofast
-march=native -flto) between 1932e1169a236849 and
9cfb95f9b92326e8
Product: gcc
Version: 12.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102947
--- Comment #1 from Jan Hubicka ---
It seems enough to lookat the WRP benchmark build time.
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=322.270.8&plot.1=307.270.8&plot.2=343.270.8&plot.3=266.270.8&plot.4=395.270.8&plot.5=412.270.8&p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102948
Bug ID: 102948
Summary: 60% build time regression on gamess in range
2fc2e3917f9c8fd94f5d101477971d16c483ef88...c16f21c7cf9
7ce48967e42d3b5d22ea169a9c2c8
Product: gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
Jan Hubicka changed:
What|Removed |Added
Summary|cray regression with -O2|[12 regression] cray
|-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #6 from Jan Hubicka ---
zen
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=198.639.0&plot.1=180.639.0&plot.2=201.639.0&plot.3=150.639.0&plot.4=246.639.0&plot.5=256.639.0&plot.6=176.639.0&;
kabylake
https://lnt.opensuse.org/d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102997
Bug ID: 102997
Summary: 45% calculix regression with LTO+PGO -march=native
-Ofast between
ce4d1f632ff3f680550d3b186b60176022f41190 and
6fca1761a16c68740f875fc487b9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102058
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
Sum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #7 from Jan Hubicka ---
this is compile time plot
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.270.8
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=289.270.8
(-O2 and -Ofast with lto)
Things has improved but
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715
Bug ID: 107715
Summary: TSVC s161 for double runs at zen4 30 times slower when
vectorization is enabled
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408
--- Comment #2 from Jan Hubicka ---
This also reproduces with zen4 and double.
jh@alberti:~/tsvc/bin> cat tt.c
typedef double real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718
Bug ID: 107718
Summary: clang optimizes TSVC s317 a lot better
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-en
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411
--- Comment #7 from Jan Hubicka ---
With znver4 current trunk and clang15 I still see this problem (clang code is
about 60% faster) for s311, s312 and s3111.
Curious s3 and s3110 no longer shows a regression.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715
Jan Hubicka changed:
What|Removed |Added
Summary|TSVC s161 for double runs |TSVC s161 and s277 for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107719
Bug ID: 107719
Summary: 14% regression on TSVC s3113 on znve4 compared to GCC
7.5
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Pri
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107597
--- Comment #7 from Jan Hubicka ---
So I guess it is asan being confused by our optimization. We intentionaly
duplicate the symbol in order to reduce cost of dynamic linking in situations
where we know it does not change semantics, but asan loo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107467
Jan Hubicka changed:
What|Removed |Added
Status|NEW |ASSIGNED
Assignee|unassigned at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105727
--- Comment #6 from Jan Hubicka ---
I don't know what clang does, but GCC keeps builtin_constant_p till late
optimization and resolves it then. So here we cross module inline (or constant
propagate) and then it becomes constant.
Outcome of __bu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739
Jan Hubicka changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739
--- Comment #8 from Jan Hubicka ---
After inlning I see:
IPA function summary for rcu_tasks_trace_pertask/5350 inlinable
global time: 13.535950
self size: 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739
--- Comment #9 from Jan Hubicka ---
Indeed volatile checks seems to be missing across ipa-prop code. Here is
smaller testcase:
__attribute__((noinline))
static int
test2(int a)
{
if (__builtin_constant_p (a))
__builtin_a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739
--- Comment #10 from Jan Hubicka ---
I am testing
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index afd9222b5a2..c037668e7d8 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -1112,6 +1112,10 @@ ipa_load_from_parm_agg (struct ipa_func_bod
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105917
Bug ID: 105917
Summary: Missed passthru jump function
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105917
Jan Hubicka changed:
What|Removed |Added
Summary|Missed passthru jump|[10/11/12/13 regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106057
Bug ID: 106057
Summary: Missed stmt_can_throw_external check in
stmt_kills_ref_p
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Prio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106057
--- Comment #1 from Jan Hubicka ---
C only testcase (also misoptimized in clang)
#include
int b;
jmp_buf buf;
__attribute__((noinline))
int maybethrow()
{
if (!b)
longjmp (buf,1);
return 2;
}
void
test(int *a)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106057
--- Comment #2 from Jan Hubicka ---
The second testcase (with longjmp) invalid since longjmp can clobber automatic
variable and making the variable static breaks the testcase since we believe
htat longjmp reads global memory state (it doesn't).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106075
Bug ID: 106075
Summary: Wrong DSE with -fnon-call-exceptions
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106077
Bug ID: 106077
Summary: Invalid IPA-SRA with non-call exceptions
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106075
--- Comment #4 from Jan Hubicka ---
PR106077 demonstrates related problem where ipa-sra concludes it is safe to
move dereference earlier in the code. It uses dominator test for that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106075
--- Comment #5 from Jan Hubicka ---
Also note that the longjmp testcase will not get misoptimized since we consider
longjmp as using all global memory.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106078
Bug ID: 106078
Summary: Invalid loop invariant motion with non-call-exceptions
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106078
--- Comment #1 from Jan Hubicka ---
This is version that does not need -fnon-call-exceptions
If called test (NULL, 0) it should be indefinitely increasing val rather then
segfaulting. Seems clang gets this one right.
int array[1];
volatile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106077
--- Comment #1 from Jan Hubicka ---
Also note that the dominance check is written the wrong way, so it only passes
for first BB in the function
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 96b020fb2dd..6b2df2f3ff0 100644
--- a/gcc/ipa-sra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
Bug ID: 106081
Summary: missed vectorization
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assign
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
--- Comment #1 from Jan Hubicka ---
This is an attempt to vectorize by hand, but it seems we do not generate
vpmovsxwd for the vector short->double conversion
struct pixels
{
short a __attribute__ ((vector_size(4*2)));
} *pixels;
struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
--- Comment #4 from Jan Hubicka ---
Thanks! It seems that imagemagick has quite few loops that inovlve consuming
shorts and producing doubles. Also it would be nice to do something about
__builtin_convertvector so it does not produce stupid code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105682
--- Comment #6 from Jan Hubicka ---
gcc-12.1.0 (bogus warning: `caller()` has no right to be const; it calls a pure
function, and that function even contains inline assembly):
I think the conlcusion here is correct. callee has pure attribute a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105676
--- Comment #3 from Jan Hubicka ---
Such code is not that obviously safe. It is possible that getval will get
inlined to some calls and not other within single function. In that case the
calling function will read and modify cache variable and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101257
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101270
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92740
Jan Hubicka changed:
What|Removed |Added
Summary|induct2 (from polyhedron) |induct2 (from polyhedron)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
Bug ID: 101908
Summary: cray regression with -O2 -ftree-slp-vectorize compared
to -O2
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101909
Bug ID: 101909
Summary: 73% regression on tfft benchmark for -O2
-ftree-loop-vectorize compared to -O2 on zen hardware
Product: gcc
Version: 12.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101910
Bug ID: 101910
Summary: tsvc regressions for -O2 -ftree-loop-vectorize at zen
hardware
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296
--- Comment #7 from Jan Hubicka ---
"every access" means that we no longer track individual bases+offsets+sizes and
everything matching the base/ref alias set will be considered conflicting.
I planned to implement smarter merging of accesses so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101257
Jan Hubicka changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97836
Jan Hubicka changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97565
--- Comment #6 from Jan Hubicka ---
has_gimple_body_p really cares about the WPA unit (we should probably note that
in the comment). Here you seem to have function that is in the WPA translation
unit but lands in different partition and in that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787
--- Comment #13 from Jan Hubicka ---
So my understanding is that ivopts does something like
offset = &base2 - &base1
and then translate
val = base2[i]
to
val = *((base1+i)+offset)
Where (base1+i) is then an iv variable.
I wonder if we con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #4 from Jan Hubicka ---
There is a cap in want_inline_self_recursive_call_p which gives up on inlining
after reaching max recursive inlining depth of 8. Problem is that the tree here
is too wide. After early inlining f0 contains 4 ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #5 from Jan Hubicka ---
There is a cap in want_inline_self_recursive_call_p which gives up on inlining
after reaching max recursive inlining depth of 8. Problem is that the tree here
is too wide. After early inlining f0 contains 4 ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111054
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #6 from Jan Hubicka ---
Created attachment 57427
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57427&action=edit
patch
The patch makes compilation to finish in reasonable time.
I ended up in need to dropping DISREGARD_INLINE_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #31 from Jan Hubicka ---
Having a testcase is great. I was just playing with crafting one.
I am still concerned about value ranges in ipa-prop's jump functions.
Let me see if I can modify the testcase to also trigger problem with val
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #39 from Jan Hubicka ---
This testcase
#include
int data[100];
__attribute__((noinline))
int bar (int d, unsigned int d2)
{
if (d2 > 10)
printf ("Bingo\n");
return d + d2;
}
int
test2 (unsigned int i)
{
if (i > 10)
_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
Jan Hubicka changed:
What|Removed |Added
Summary|[14 regression] ICU |[12/13/14 regression] ICU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111960
--- Comment #5 from Jan Hubicka ---
hmm. cfg.cc:815 for me is:
fputs (", maybe hot", outf);
which seems quite safe.
The problem does not seem to reproduce for me:
jh@ryzen3:~/gcc/build/gcc> ./xgcc -B ./ tt.c -O
--param=max-inline-r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802
--- Comment #5 from Jan Hubicka ---
I don't think we can reasonably expect every caller of lambda function to be
early inlined, so we need to extend ipa-prop to understand the obfuscated code.
I disucussed that with Martin some time ago - I thi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114052
--- Comment #5 from Jan Hubicka ---
So if I understand it right, you want to determine the property that if the
loop header is executed then BB containing undefined behavior at that iteration
will be executed, too.
modref tracks if function wil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85432
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114207
Jan Hubicka changed:
What|Removed |Added
Status|NEW |ASSIGNED
Assignee|unassigned at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92387
--- Comment #5 from Jan Hubicka ---
The revision is changing inlining decisions, so it would be probably possible
to reproduce the problem without that change with right alaways_inline and
noinline attributes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114241
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106716
--- Comment #6 from Jan Hubicka ---
The reason why GIMPLE_PREDICT is ignored is that it is never used after ipa-icf
and gets removed at the very beggining of late optimizations.
GIMPLE_PREDICT is consumed by profile_generate pass which is run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #55 from Jan Hubicka ---
> Anyway, can we in the spot my patch changed just walk all
> source->node->callees > cgraph_edges, for each of them find the corresponding
> cgraph_edge in the alias > and for each walk all the jump_functi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #58 from Jan Hubicka ---
Created attachment 57702
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57702&action=edit
Compare value ranges in jump functions
This patch implements the jump function compare, however it is not good
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #59 from Jan Hubicka ---
just to explain what happens in the testcase. There is test and testb. They
are almost same:
int
testb(void)
{
struct bar *fp;
test2 ((void *)&fp);
fp = NULL;
(*ptr)++;
test3 ((void *)&fp);
}
the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
--- Comment #4 from Jan Hubicka ---
The change makes loop iteration estimates more realistics, but does not
introduce any new code that actually changes the IL, so it seems this makes
existing problem more visible. I will try to debug what happ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
--- Comment #6 from Jan Hubicka ---
On this testcase trunk does get same dump as gcc13 for pass just before ch2
with ch2 we get:
@@ -192,9 +236,8 @@
# DEBUG BEGIN_STMT
goto ; [100.00%]
- [local count: 954449105]:
+ [local count: 9544
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
Jan Hubicka changed:
What|Removed |Added
Status|NEW |ASSIGNED
Assignee|unassigned at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765
--- Comment #6 from Jan Hubicka ---
Running auto-fdo without guessing branch probabilities is somewhat odd idea in
general. I suppose we can indeed just avoid setting full_profile flag. Though
the optimization passes are not that much tested to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109817
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110757
Bug ID: 110757
Summary: 7% parest regression on zen3 -Ofast -march=native
-flto between g:4dbb3af1efe55174 (2023-07-14 00:54)
and g:a5088dc3f5ef73c8 (2023-07-17 03:24)
Prod
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110758
Bug ID: 110758
Summary: 8% hmmer regression on zen1 and zen3 with -Ofast
-march=native -flto between g:8377cf1bf41a0a9d
(2023-07-05 01:46) and g:3a61ca1b9256535e (2023-07-06
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110757
--- Comment #4 from Jan Hubicka ---
Most of the profile based regression is gone between
g:1c6231c05bdccab3 (2023-07-21 03:06)
and
g:f33fdf9e7c038639 (2023-07-23 00:17)
This should be:
commit a31ef26b056d0c4f0a9f08b6eb81456ea257298e
Author: Ja
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110758
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2023-07-26
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832
Bug ID: 110832
Summary: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76
(2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27
03:44) on zen3 and core
Product: gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832
--- Comment #1 from Jan Hubicka ---
This time it seems that there is only one profile change:
commit 645c67f80c6258c1f54ec567f604008adbdb8a04
Author: Jan Hubicka
Date: Wed Jul 26 08:59:23 2023 +0200
Fix profile_count::to_sreal_scale
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
--- Comment #15 from Jan Hubicka ---
if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
/* If result of comparsion is unknown, prefer EARLY_BB.
Thus use !(...>=..) rather than (...<...) */
- && !(best_bb->count * 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110833
Bug ID: 110833
Summary: gamess regression on Ice Lake with -Ofast
-march=native between g:1c6231c05bdccab3 (2023-07-21
03:06) and g:bbc1a102735c72e3 (2023-07-23 04:55)
Prod
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
--- Comment #16 from Jan Hubicka ---
It is really hard to make loop splitting to do something.
It does not like canonicalized invariant variables since loop exit condition
should not be NE_EXPR and it does not like when VRP turns LT/GT into NE.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110831
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2023-07-28
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689
--- Comment #16 from Jan Hubicka ---
I am testing the following that makes loop splitting understand when first
iteration is special.
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index 70cd0aaefa7..1fd3ee1d1e5 100644
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689
--- Comment #17 from Jan Hubicka ---
I posted the patch. With it we split the loop, but we don't get really big
improvements from that
h@ryzen3:~/gcc/build3/gcc> ./xgcc -B ./ -Ofast c.ii -S -fopt-info 2>&1 | grep
split ; perf stat ./a.out
c.C:15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110758
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 110758, which changed state.
Bug 110758 Summary: [14 Regression] 8% hmmer regression on zen1/3 with -Ofast
-march=native -flto between g:8377cf1bf41a0a9d (2023-07-05 01:46) and
g:3a61ca1b9256535e (2023-07-06 16:56); g:d7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
--- Comment #21 from Jan Hubicka ---
Fixing loop distribution and vectorizer profile update seems to do the trick
with profile feedback. Without we are still worse than in July last year on
zen2 tester (zen3 and ice lake seems to behave differen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110857
--- Comment #1 from Jan Hubicka ---
The sanity check fires since profile count involved are not compatible that
should never happen within a single function. Would it be possible to dump
them? From debugger one should be able to call
p this->
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
--- Comment #23 from Jan Hubicka ---
Thanks,
I think I will need to work out the remaining vectorizer problems. One issue
seems to be interaction with loop distribution. Loop distribution seems to
intorduce alias checks that are later removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852
--- Comment #1 from Jan Hubicka ---
This is likely g:eab57b825bcc350e9ff44eb2fa739a80199d9bb1 which fixed
prediction order and uncovered latent bug in combininig predictions with known
probabilities. I will take a look.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
--- Comment #24 from Jan Hubicka ---
g:2e93b92c1ec5fbbbe10765c6e059c3c90d564245 fixes the profile update after
cancelled distribution. However it does not help hmmer since we actually
vectorize that loop iterating 0 times. We need to figure out
401 - 500 of 861 matches
Mail list logo