-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this small case, gcc failed to detect trailing zero count calculation, so
the x86 instruction tzcnt cannot be generated, but clang can generate it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #12 from Jiangning Liu
---
Hi Richi,
> That said, "failure" to identify the common (vector) load is known
> and I do have experimental patches trying to address that but did
> not yet arrive at a conclusive "best" approach.
It was
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
--- Comment #11 from Jiangning Liu
---
Hi Wilco,
> "it means we will need a linker optimization to remove those redundant BTIs
> (eg. by changing them into NOPs)"
It will be only for performance optimization, right? If we don't care about
pe
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the following small case,
#include
#include
#include
#define NANOSECS10L
int main
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this small case, if-conversion optimization in back-end generated csel
instruction for aarch64, which is unsafe. The address of variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430
--- Comment #17 from Jiangning Liu
---
Yes.
> -Original Message-
> From: tnfchris at gcc dot gnu.org
> Sent: Friday, November 11, 2022 4:48 PM
> To: JiangNing Liu
> Subject: [Bug tree-optimization/89430] A missing ifcvt optimization t
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
$ cat foo.cpp
extern "C" __attribute__((__warning__(""))) void _foo(int) {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #7 from Jiangning Liu ---
Without reverting the commit g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92, we
still see exchange2 performance issue for aarch64. BTW, we have been using
-fno-inline-functions-called-once to get the best perform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100511
--- Comment #5 from Jiangning Liu ---
If we change "c3 = a" to "c3 = x->b", GCC can optimize it without IPA. It seems
VRP is working for this case.
$ cat tt7.c
#include
int a;
typedef struct {
int b;
int count;
} XX;
int g;
__attrib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100511
--- Comment #2 from Jiangning Liu ---
Then why gcc can't optimize this case either? sizeof (XX) <> sizeof(g) here.
#include
int a;
typedef struct {
int b;
int count;
} XX;
int g;
__attribute__((noinline)) void f(XX *x)
{
int c1
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this simple case, gcc doesn't know the if condition (i > c2) is always
false.
#include
typedef struct {
int count;
} XX;
int g;
__att
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99946
--- Comment #1 from Jiangning Liu ---
Is there any gcc pass that can deal with this simple optimization?
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this simple case,
$ cat test_cond.c
#define likely(x) __builtin_expect((x),1)
#define
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #4 from Jiangning Liu ---
Hi Honza,
Do you see any other real case problems if the patch
g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 is not applied?
If exchange2 is the only one affected by this patch so far, and because we have
obse
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #12 from Jiangning Liu
---
MGO RFC is at https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #11 from Jiangning Liu
---
(In reply to rguent...@suse.de from comment #8)
> On Sat, 9 Jan 2021, jiangning.liu at amperecomputing dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
> >
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #10 from Jiangning Liu
---
(In reply to Hongtao.liu from comment #9)
> It looks like a SOA/AOC opt opportunity which is discussed in
> https://gcc.gnu.org/wiki/
> cauldron2015?action=AttachFile&do=view&target=Olga+Golovanevsky_+Memor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #7 from Jiangning Liu ---
(In reply to rguent...@suse.de from comment #6)
> On January 9, 2021 4:17:17 AM GMT+01:00, "jiangning.liu at amperecomputing
> dot com" wrote:
> >https://gcc.gnu.org/bugzilla
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #5 from Jiangning Liu ---
> It has to be done with care of course, cost modeling is difficult
> (we need to have a good estimate of n and m or need to version
> the whole nest). That said, usually we attempt the reverse transform.
B
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #2 from Jiangning Liu ---
Loop distribution can only handle very simple case. If the inner loop has
complicated control flow and other memory accesses with loop-carried
dependence, it would be hard to handle it. For example,
int foo
Component: web
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
Option ipcp-unit-growth (9.1.0) has been renamed to ipa-cp-unit-growth
(10.1.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93163
Jiangning Liu changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93163
--- Comment #1 from Jiangning Liu ---
Created attachment 47591
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47591&action=edit
bad case from llvm build
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
LLVM trunk build with gcc trunk exposed failure "internal compiler error:
verify_gimple failed".
$ g++ -O3 -c bad.cpp
bad.cpp: In constructor
‘
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649
--- Comment #5 from Jiangning Liu ---
Unrolling 1024 iterations would increase code size a lot, so usually we don't
do that. 1024 is only an example. Without knowing we could eliminate most of
them, we don't really want to do loop unrolling, I gu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649
--- Comment #3 from Jiangning Liu ---
It is a stupid test, but it is simplified from a real application.
To solve even more complicated scenario, this simple case needs to be addressed
first.
If we change the case to be as below,
int f(void)
{
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this small case,
int f(void)
{
int i, a[1024];
for (i=0; i<1024; i++)
a[i] = 5;
return a[0];
}
"gcc -O3" can
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246
--- Comment #3 from Jiangning Liu ---
Expect to vectorize the inner loop by generating the code below for x86,
vpbroadcastd [mem], ymm0
vpaddd [mem], ymm0, ymm1
vpbroadcastd reg, ymm2
vpcmpeqd ymm2, ymm1, k0
kortestw k0, k0
cmovne ...
AArch64 s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246
--- Comment #2 from Jiangning Liu ---
Created attachment 46626
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46626&action=edit
A new test
Attached is a test case that is more closely matching the real-world code.
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the following simple case, the inner loop can be completely removed by
vectorization. GCC fails to do that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195
Jiangning Liu changed:
What|Removed |Added
CC||msebor at gcc dot gnu.org
--- Comment #8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195
--- Comment #6 from Jiangning Liu ---
It seems -Werror=maybe-uninitialized cannot always work, and it fails to report
the error message for the case below. However, the option name is "maybe-xxx",
so I can understand it is OK, but for the same re
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195
--- Comment #3 from Jiangning Liu ---
The gcc compilation difference between FOR_UP_LIMIT is 3 and 4 is that,
cunrolli can do loop unrolling when FOR_UP_LIMIT is 3, for which the control
flow can be significantly simplified, so the conditional st
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134
--- Comment #13 from Jiangning Liu
---
Feng already sent out the 1st patch at
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00541.html .
But the 2nd one is related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89713 .
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430
--- Comment #8 from Jiangning Liu ---
It is related to https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
Bernd's patch is an overkill.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430
--- Comment #7 from Jiangning Liu ---
To avoid "readonly" issue, try this case,
unsigned test(unsigned k, unsigned b) {
unsigned a[2];
if (b < a[k]) {
a[k] = b;
}
return a[0]+a[2];
}
Variable a is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430
--- Comment #6 from Jiangning Liu ---
(In reply to Richard Biener from comment #5)
> (In reply to Jiangning Liu from comment #4)
> > >We need to be careful with loads
> > >or stores, for instance a load might not trap, while a store would
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430
--- Comment #4 from Jiangning Liu ---
>We need to be careful with loads
>or stores, for instance a load might not trap, while a store would,
>so if we see a dominating read access this doesn't mean that a later
>write access would
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For a small case,
unsigned *a;
void test(unsigned k, unsigned b) {
if (b < a[k]) {
a[k] = b;
}
}
"gc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134
--- Comment #10 from Jiangning Liu
---
(In reply to Martin Sebor from comment #9)
> But since GCC emits infinite loops regardless of whether or not
> they have any side-effects, whether inc() is pure or not may not matter.
I think "for (; it !
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134
--- Comment #5 from Jiangning Liu ---
The loop below should be treated as a finite loop,
for (iter = booktable.begin(); iter!=booktable.end(); ++iter) {
...
}
so there is a chance to optimize away the empty loop, in which do_something
doesn'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134
Jiangning Liu changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|INVALID
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For this simple case,
__attribute__((pure)) __attribute__((noinline)) int inc(int i)
{
/* Do something
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For aarch64, SLP optimization generates ugly code for the case below,
int test_slp( unsigned char *b )
{
unsigned int tmp[4][4];
int sum = 0
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the simple loop below, gcc -O3 fails to vectorize it.
unsigned int tmp[1024];
unsigned int test_vec(int n)
{
int sum = 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #4 from Jiangning Liu ---
I expect "gcc -O3 -flto" could work.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #2 from Jiangning Liu ---
memcmp doesn't return the position where they differ.
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the small case below, GCC -O3 can't vectorize the small loop to do byte
comparison in func2.
void *malloc
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
GCC -O3 can't vectorize the following typical loop for getting max value and
index from an array.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530
--- Comment #1 from Jiangning Liu ---
Created attachment 44396
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44396&action=edit
vectorization failure
Attached is -O3 result for aarch64, in which no vectorization code generated at
all.
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
GCC -O3 can't vectorize the following simple case.
$ cat test_loop_2.c
int test_loop_2(char *p1, char *p2)
{
int s = 0;
for(int i=0; i<4;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504
--- Comment #1 from Jiangning Liu ---
Created attachment 44387
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44387&action=edit
bad vectorizatoin result for boundary size 8
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
Created attachment 44386
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44386&action=edit
bad vectorizatoin result for boundary size 16
For t
53 matches
Mail list logo