: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
CC: marxin at gcc dot gnu.org
Target Milestone: ---
GCC failed to compile spec2017 523.xalancbmk_r. The command line and
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
Consider the following code
=== begin code ===
#define LENGTH 512
#define STRIDE 32
char src[LENGTH];
char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91374
--- Comment #2 from Hao Liu ---
(In reply to Richard Biener from comment #1)
> So you ask for main to be converted to
>
> if (idx == 0)
>foo_32_16 ();
> else /* idx == 1 */
>foo_16_8 ();
>
> correct? It shoulds like an interesting id
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
The following code can not be vectorized ( compiling with gcc -O3 ):
=== begin code ===
char src[512];
char dst[512];
#define WIDTH 8
void foo(int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573
--- Comment #5 from Hao Liu ---
Great. It seems really a SLP issue.
I've learnt a lot about vectorization, dump info and -march. Thanks for your
help.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492
Hao Liu changed:
What|Removed |Added
CC||hliu at amperecomputing dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318
Hao Liu changed:
What|Removed |Added
CC||hliu at amperecomputing dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318
--- Comment #5 from Hao Liu ---
Hi Nanthan,
We can still reprodcue this problem on CentOS 7 (X86) and CentOS 8.2 (AArch64).
We use last GCC version of yesterday:108beb75da
The configure and build commands are (Bash is used):
$ ../gcc/configure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318
--- Comment #7 from Hao Liu ---
I found that:
1. "make -j1" can pass, but "make -j8" always fails. It seems something wrong
with parallel build
2. When "make -j8" failed, if I try "make -j8" again, it can pass.
> What happens if you cd into
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98318
--- Comment #8 from Hao Liu ---
Hi Nathan,
The problem is related to use another make binary, which is 4.2.0 and built by
ourselves. Maybe there is a strange bug.
Anyway, after using the system installed make (which is 4.2.1 and under
/usr/bin/
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
As we know, dependent loads are not friendly to cache. Especially when in
nested loops, dependent loads such as pa->pb-
: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
Compile the following case with: gcc simp.c -Ofast -mcpu=neoverse-n1 -S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #11 from Hao Liu ---
Hi Richard,
That's great! Glad to hear the status. Waiting for the patches to be ready and
upstreamed to trunk.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #15 from Hao Liu ---
Ah, I see.
I've sent out a quick fix patch for code review. I'll investigate more about
this and find out the root cause.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #17 from Hao Liu ---
> Thanks! I can reduce a testcase for you if you want :)
That will be very helpful. Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #19 from Hao Liu ---
> Hi, here's the reduced case
Hi Tarmar, thanks for the case. I've modified it to reproduce the ICE without
LTO and have updated the patch.
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
This case extracted from another benchmark and it is simpler than the case in
PR101450, as it has the additional
: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
This is inspired by clang. Compile the follwing case with "-mcpu=neover
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449
--- Comment #2 from Hao Liu ---
That looks better than the currently generated code (it saves one "MOV"
instruction). Yes, it has the loop-carried dependency advantage. But it still
uses one more register for "8*step" (There may be a register pr
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
Hi, I'm trying to use tune loop unrolling during vectorization (see more:
tree
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
This seems an obvious bug in tree-vect-loop.cc:
(1) This var is declared (but not initialized) and used in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
--- Comment #2 from Hao Liu ---
> Is the warning from some static analyzer?
No. I just find it maybe a bug while looking at the code.
> slp should be true always (always do analyze slp), it doesn't care what's in
> slp_done_for_suggested_uf.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
--- Comment #4 from Hao Liu ---
> IMHO, the initialization with false is unnecessary and very likely it isn't
> able to get optimized, it seems worse from this point of view.
Sorry. I don't think so. See more at
https://www.oreilly.com/library
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
--- Comment #5 from Hao Liu ---
BTW, there is no warning is probably because the original code is too
complicated and not inlined.
Compile the simple case by "g++ -O3 -S -Wall hello.c":
int foo(bool a) {
bool b;
if (a || b)
return 1;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
--- Comment #7 from Hao Liu ---
> int foo() {
> bool a = true;
> bool b;
> if (a || b)
> return 1;
> b = true;
> return 0;
> }
>
> still has the warning, it looks something can be improved (guess we prefer
> not to emit warning).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
--- Comment #10 from Hao Liu ---
> foo is just an example for not getting inlined, the point here is extra cost
> paid.
My point is that the case is different from the original case in
tree-vect-loop.cc. For example, change the case as follow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110531
Hao Liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474
Hao Liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
This problem causes a performance regression in SPEC2017 538.imagick. For the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #2 from Hao Liu ---
To my understanding, "reduction latency" is the least number of cycles needed
to do the reduction calculation for 1 iteration of loop. It is calcualted by
the extra instruction issue-info of the new cost models i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #2 from Hao Liu ---
Hi, I bisected the following 3 commits (sequantial):
[v3] 3a61ca1b925 - Improve profile updates after loop-ch and cunroll
(2023-07-06)
[v2] d4c2e34deef - Improve scale_loop_profile (2023-07-06)
[v1] 224fd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #3 from Hao Liu ---
Sorry, it seems this case can not be fixed by only adjusting the calculation of
"reduction latency". Even it becomes smaller, the case still can not be
vectorized as the "general operations" count is still too la
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #6 from Hao Liu ---
Thanks for the confirmation about the reduction latency. I'll create a simple
patch to fix this.
> Discounting the loads, we do have 15 general operations.
That's true, and there are indeed 8 general operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #8 from Hao Liu ---
Thanks for the explanation. Understood the root cause and that's reasonable.
So, do you have plan to fix this (i.e. to separate the FP and integer types)?
I want to enable the new costs for Ampere1, which is sim
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
SPEC2017 525.x264 build failure. Options are: -O3 -mcpu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #26 from Hao Liu ---
But for now, the patch should fix the regression.(In reply to Tamar Christina
from comment #25)
> Is still pretty inefficient due to all the extends. If we generate better
> code here this may tip the scale back
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113446
--- Comment #6 from Hao Liu ---
Hi Jakub,
That's great. Thanks for the fix.
37 matches
Mail list logo