[Bug tree-optimization/114760] New: traling zero count detection failure

2024-04-17 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this small case, gcc failed to detect trailing zero count calculation, so the x86 instruction tzcnt cannot be generated, but clang can generate it

[Bug tree-optimization/98138] BB vect fail to SLP one case

2023-10-04 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #12 from Jiangning Liu --- Hi Richi, > That said, "failure" to identify the common (vector) load is known > and I do have experimental patches trying to address that but did > not yet arrive at a conclusive "best" approach. It was

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2023-08-14 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #11 from Jiangning Liu --- Hi Wilco, > "it means we will need a linker optimization to remove those redundant BTIs > (eg. by changing them into NOPs)" It will be only for performance optimization, right? If we don't care about pe

[Bug tree-optimization/109603] New: Vectorization failure for a small loop containing a simple branch

2023-04-24 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the following small case, #include #include #include #define NANOSECS10L int main

[Bug rtl-optimization/109343] New: invalid if conversion optimization for aarch64

2023-03-30 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this small case, if-conversion optimization in back-end generated csel instruction for aarch64, which is unsafe. The address of variable

[Bug tree-optimization/89430] A missing ifcvt optimization to generate csel

2022-11-11 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430 --- Comment #17 from Jiangning Liu --- Yes. > -Original Message- > From: tnfchris at gcc dot gnu.org > Sent: Friday, November 11, 2022 4:48 PM > To: JiangNing Liu > Subject: [Bug tree-optimization/89430] A missing ifcvt optimization t

[Bug c/106823] New: #pragma GCC diagnostic ignored "-Wattribute-warning" doesn't work for -flto

2022-09-03 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- $ cat foo.cpp extern "C" __attribute__((__warning__(""))) void _foo(int) {

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-11-28 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #7 from Jiangning Liu --- Without reverting the commit g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92, we still see exchange2 performance issue for aarch64. BTW, we have been using -fno-inline-functions-called-once to get the best perform

[Bug tree-optimization/100511] Fail to remove dead code in loop

2021-05-11 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100511 --- Comment #5 from Jiangning Liu --- If we change "c3 = a" to "c3 = x->b", GCC can optimize it without IPA. It seems VRP is working for this case. $ cat tt7.c #include int a; typedef struct { int b; int count; } XX; int g; __attrib

[Bug tree-optimization/100511] Fail to remove dead code in loop

2021-05-10 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100511 --- Comment #2 from Jiangning Liu --- Then why gcc can't optimize this case either? sizeof (XX) <> sizeof(g) here. #include int a; typedef struct { int b; int count; } XX; int g; __attribute__((noinline)) void f(XX *x) { int c1

[Bug tree-optimization/100511] New: Fail to remove dead code in loop

2021-05-10 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this simple case, gcc doesn't know the if condition (i > c2) is always false. #include typedef struct { int count; } XX; int g; __att

[Bug tree-optimization/99946] fail to exchange if conditions in terms of likely/unlikely probability

2021-04-06 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99946 --- Comment #1 from Jiangning Liu --- Is there any gcc pass that can deal with this simple optimization?

[Bug tree-optimization/99946] New: fail to exchange if conditions in terms of likely/unlikely probability

2021-04-06 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this simple case, $ cat test_cond.c #define likely(x) __builtin_expect((x),1) #define

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-22 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #4 from Jiangning Liu --- Hi Honza, Do you see any other real case problems if the patch g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 is not applied? If exchange2 is the only one affected by this patch so far, and because we have obse

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-14 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #12 from Jiangning Liu --- MGO RFC is at https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-11 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #11 from Jiangning Liu --- (In reply to rguent...@suse.de from comment #8) > On Sat, 9 Jan 2021, jiangning.liu at amperecomputing dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 > > >

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-11 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #10 from Jiangning Liu --- (In reply to Hongtao.liu from comment #9) > It looks like a SOA/AOC opt opportunity which is discussed in > https://gcc.gnu.org/wiki/ > cauldron2015?action=AttachFile&do=view&target=Olga+Golovanevsky_+Memor

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-09 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #7 from Jiangning Liu --- (In reply to rguent...@suse.de from comment #6) > On January 9, 2021 4:17:17 AM GMT+01:00, "jiangning.liu at amperecomputing > dot com" wrote: > >https://gcc.gnu.org/bugzilla

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-08 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #5 from Jiangning Liu --- > It has to be done with care of course, cost modeling is difficult > (we need to have a good estimate of n and m or need to version > the whole nest). That said, usually we attempt the reverse transform. B

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-08 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #2 from Jiangning Liu --- Loop distribution can only handle very simple case. If the inner loop has complicated control flow and other memory accesses with loop-carried dependence, it would be hard to handle it. For example, int foo

[Bug web/95380] New: ipcp-unit-growth was renamed to ipa-cp-unit-growth

2020-05-27 Thread jiangning.liu at amperecomputing dot com
Component: web Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options Option ipcp-unit-growth (9.1.0) has been renamed to ipa-cp-unit-growth (10.1.0

[Bug c++/93163] internal compiler error: verify_gimple failed

2020-01-05 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93163 Jiangning Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug c/93163] internal compiler error: verify_gimple failed

2020-01-05 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93163 --- Comment #1 from Jiangning Liu --- Created attachment 47591 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47591&action=edit bad case from llvm build

[Bug c/93163] New: internal compiler error: verify_gimple failed

2020-01-05 Thread jiangning.liu at amperecomputing dot com
Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- LLVM trunk build with gcc trunk exposed failure "internal compiler error: verify_gimple failed". $ g++ -O3 -c bad.cpp bad.cpp: In constructor ‘

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649 --- Comment #5 from Jiangning Liu --- Unrolling 1024 iterations would increase code size a lot, so usually we don't do that. 1024 is only an example. Without knowing we could eliminate most of them, we don't really want to do loop unrolling, I gu

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649 --- Comment #3 from Jiangning Liu --- It is a stupid test, but it is simplified from a real application. To solve even more complicated scenario, this simple case needs to be addressed first. If we change the case to be as below, int f(void) {

[Bug tree-optimization/92649] New: dead store elimination

2019-11-24 Thread jiangning.liu at amperecomputing dot com
Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this small case, int f(void) { int i, a[1024]; for (i=0; i<1024; i++) a[i] = 5; return a[0]; } "gcc -O3" can

[Bug tree-optimization/91246] vectorization failure for a small loop to search array element

2019-07-24 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246 --- Comment #3 from Jiangning Liu --- Expect to vectorize the inner loop by generating the code below for x86, vpbroadcastd [mem], ymm0 vpaddd [mem], ymm0, ymm1 vpbroadcastd reg, ymm2 vpcmpeqd ymm2, ymm1, k0 kortestw k0, k0 cmovne ... AArch64 s

[Bug tree-optimization/91246] vectorization failure for a small loop to search array element

2019-07-24 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246 --- Comment #2 from Jiangning Liu --- Created attachment 46626 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46626&action=edit A new test Attached is a test case that is more closely matching the real-world code.

[Bug tree-optimization/91246] New: vectorization failure for a small loop to search array element

2019-07-24 Thread jiangning.liu at amperecomputing dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the following simple case, the inner loop can be completely removed by vectorization. GCC fails to do that

[Bug middle-end/91195] [10 regression] incorrect may be used uninitialized smw (272711, 273474]

2019-07-23 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195 Jiangning Liu changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #8

[Bug middle-end/91195] [10 regression] incorrect may be used uninitialized smw (272711, 273474]

2019-07-22 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195 --- Comment #6 from Jiangning Liu --- It seems -Werror=maybe-uninitialized cannot always work, and it fails to report the error message for the case below. However, the option name is "maybe-xxx", so I can understand it is OK, but for the same re

[Bug middle-end/91195] [10 regression] incorrect may be used uninitialized smw (272711, 273474]

2019-07-21 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91195 --- Comment #3 from Jiangning Liu --- The gcc compilation difference between FOR_UP_LIMIT is 3 and 4 is that, cunrolli can do loop unrolling when FOR_UP_LIMIT is 3, for which the control flow can be significantly simplified, so the conditional st

[Bug tree-optimization/89134] A missing optimization opportunity for a simple branch in loop

2019-03-29 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134 --- Comment #13 from Jiangning Liu --- Feng already sent out the 1st patch at https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00541.html . But the 2nd one is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89713 .

[Bug rtl-optimization/89430] A missing ifcvt optimization to generate csel

2019-02-26 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430 --- Comment #8 from Jiangning Liu --- It is related to https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html Bernd's patch is an overkill.

[Bug rtl-optimization/89430] A missing ifcvt optimization to generate csel

2019-02-21 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430 --- Comment #7 from Jiangning Liu --- To avoid "readonly" issue, try this case, unsigned test(unsigned k, unsigned b) { unsigned a[2]; if (b < a[k]) { a[k] = b; } return a[0]+a[2]; } Variable a is

[Bug rtl-optimization/89430] A missing ifcvt optimization to generate csel

2019-02-21 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430 --- Comment #6 from Jiangning Liu --- (In reply to Richard Biener from comment #5) > (In reply to Jiangning Liu from comment #4) > > >We need to be careful with loads > > >or stores, for instance a load might not trap, while a store would

[Bug rtl-optimization/89430] A missing ifcvt optimization to generate csel

2019-02-21 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430 --- Comment #4 from Jiangning Liu --- >We need to be careful with loads >or stores, for instance a load might not trap, while a store would, >so if we see a dominating read access this doesn't mean that a later >write access would

[Bug rtl-optimization/89430] New: A missing ifcvt optimization to generate csel

2019-02-21 Thread jiangning.liu at amperecomputing dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For a small case, unsigned *a; void test(unsigned k, unsigned b) { if (b < a[k]) { a[k] = b; } } "gc

[Bug tree-optimization/89134] A missing optimization opportunity for a simple branch in loop

2019-01-31 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134 --- Comment #10 from Jiangning Liu --- (In reply to Martin Sebor from comment #9) > But since GCC emits infinite loops regardless of whether or not > they have any side-effects, whether inc() is pure or not may not matter. I think "for (; it !

[Bug tree-optimization/89134] A missing optimization opportunity for a simple branch in loop

2019-01-31 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134 --- Comment #5 from Jiangning Liu --- The loop below should be treated as a finite loop, for (iter = booktable.begin(); iter!=booktable.end(); ++iter) { ... } so there is a chance to optimize away the empty loop, in which do_something doesn'

[Bug tree-optimization/89134] A missing optimization opportunity for a simple branch in loop

2019-01-31 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89134 Jiangning Liu changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID

[Bug tree-optimization/89134] New: A missing optimization opportunity for a simple branch in loop

2019-01-30 Thread jiangning.liu at amperecomputing dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For this simple case, __attribute__((pure)) __attribute__((noinline)) int inc(int i) { /* Do something

[Bug tree-optimization/88492] New: SLP optimization generates ugly code

2018-12-13 Thread jiangning.liu at amperecomputing dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For aarch64, SLP optimization generates ugly code for the case below, int test_slp( unsigned char *b ) { unsigned int tmp[4][4]; int sum = 0

[Bug tree-optimization/88459] New: vectorization failure for a simple sum reduction loop

2018-12-11 Thread jiangning.liu at amperecomputing dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the simple loop below, gcc -O3 fails to vectorize it. unsigned int tmp[1024]; unsigned int test_vec(int n) { int sum = 0

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2018-12-07 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #4 from Jiangning Liu --- I expect "gcc -O3 -flto" could work.

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2018-12-06 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #2 from Jiangning Liu --- memcmp doesn't return the position where they differ.

[Bug tree-optimization/88398] New: vectorization failure for a small loop to do byte comparison

2018-12-06 Thread jiangning.liu at amperecomputing dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the small case below, GCC -O3 can't vectorize the small loop to do byte comparison in func2. void *malloc

[Bug tree-optimization/88259] New: vectorization failure for a typical loop for getting max value and index

2018-11-29 Thread jiangning.liu at amperecomputing dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- GCC -O3 can't vectorize the following typical loop for getting max value and index from an array.

[Bug tree-optimization/86530] Vectorization failure for a simple loop

2018-07-16 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530 --- Comment #1 from Jiangning Liu --- Created attachment 44396 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44396&action=edit vectorization failure Attached is -O3 result for aarch64, in which no vectorization code generated at all.

[Bug tree-optimization/86530] New: Vectorization failure for a simple loop

2018-07-16 Thread jiangning.liu at amperecomputing dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- GCC -O3 can't vectorize the following simple case. $ cat test_loop_2.c int test_loop_2(char *p1, char *p2) { int s = 0; for(int i=0; i<4;

[Bug tree-optimization/86504] vectorization failure for a nest loop

2018-07-12 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504 --- Comment #1 from Jiangning Liu --- Created attachment 44387 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44387&action=edit bad vectorizatoin result for boundary size 8

[Bug tree-optimization/86504] New: vectorization failure for a nest loop

2018-07-12 Thread jiangning.liu at amperecomputing dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- Created attachment 44386 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44386&action=edit bad vectorizatoin result for boundary size 16 For t