[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161

2020-03-19 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 Jiu Fu Guo changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #10 from Jiu Fu Guo --- For power, the patch enables -funroll-loops (with small loops unroller in RTL) and which also enabled the 'cunroll'(complete unroller) on tree. For this loop(the inner loop), 'cunroll' figures out that the lo

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #11 from Jiu Fu Guo --- In general, 'cunroll' could help performance visibly on some workload, like SPEC. In this case, it may be in question if the performance is improved.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #12 from Jiu Fu Guo --- > executed at most 13 times. Then the complete unroller could handle this loop. Correction: 13+1 times

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #13 from Jiu Fu Guo --- In this case, the loop body execution is at most a given number, but not an exact number. It would be only some iterations are executed at runtime. As above said this may false for 'while (count[n] == extent[n]

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #15 from Jiu Fu Guo --- Hi Thomas, Are you using a test case to check the performance? If you have, would you please share it, then we can use it to tune a heuristic improvement for cunroll. Thanks.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #17 from Jiu Fu Guo --- For this case, as you said, I also think it is better to avoid unrolling for the loop. '#pragma GCC unroll 1' could help to prevent the loop to be unrolled, even someone compiles it with aggressive unroll opti

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #18 from Jiu Fu Guo --- Currently, I'm thinking to enhance GCC 'cunroll' as: if the loop has multi-exits or upbound is not a fixed number, we may not do 'complete unroll' for the loop, except -funroll-all-loops is specified.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #25 from Jiu Fu Guo --- (In reply to Richard Biener from comment #23) > (In reply to Richard Biener from comment #20) > > (In reply to Jiu Fu Guo from comment #18) > > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > > if th

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #26 from Jiu Fu Guo --- (In reply to Richard Biener from comment #20) > (In reply to Jiu Fu Guo from comment #18) > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > if the loop has multi-exits or upbound is not a fixed numbe

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #27 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #26) > (In reply to Richard Biener from comment #20) > > (In reply to Jiu Fu Guo from comment #18) > > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > > if the lo

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-13 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #31 from Jiu Fu Guo --- (In reply to Richard Biener from comment #28) > > For the loop which has multi-exits, it may not helpful to unroll it, > > especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c. > > Since

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-13 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #33 from Jiu Fu Guo --- (In reply to Richard Biener from comment #32) > Note I don't think the unrolling is excessive - store motion then applying > to all count[] and all computations hoisted out of the loop may be a bit > too much f

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-18 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #34 from Jiu Fu Guo --- As previous patch 6d099a76a0f6a040a3e678f2bce7fc69cc3257d8(rs6000: Enable limited unrolling at -O2) only affects simple loops on rs6000. We may also set limits for GIMPLE cunroll, like for RTL unroller throug

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-19 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-20 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #26 from Jiu Fu Guo --- Had a test on spec2017 xz_r by changing the specified loop manually, on ppc64le. original loop (this loops occur three times in code): while (++len != len_limit)

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #27 from Jiu Fu Guo --- (In reply to Wilco from comment #13) > So to add some real numbers to the discussion, the average number of > iterations is 4.31. Frequency stats (16 includes all iterations > 16 too): > > 1: 29.0 > 2: 4.2 > 3

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #28 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #27) > > 12: 1.2 > > 13: 0.9 > > 14: 0.8 > > 15: 0.7 > > 16: 2.1 > > > > Find one interesting thing: > If using widen reading for the run which > 16 iterations, we can se

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #30 from Jiu Fu Guo --- (In reply to Wilco from comment #29) > (In reply to Jiu Fu Guo from comment #28) > > > > > > Find one interesting thing: > > > If using widen reading for the run which > 16 iterations, we can see the > > > per

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #32 from Jiu Fu Guo --- (In reply to Wilco from comment #31) > (In reply to Jiu Fu Guo from comment #30) > > (In reply to Wilco from comment #29) > > > > The key question remains whether it is legal to assume the limit implies > > >

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #33 from Jiu Fu Guo --- It would be relatively easy if the target supports unaligned access. like read64ne in https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/common/memcmplen.h Then the alignment issue is relaxed. It may be

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-01 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #36 from Jiu Fu Guo --- (In reply to Jakub Jelinek from comment #10) > If the compiler knew say from PGO that pos is usually a multiple of certain > power of two and that the loop usually iterates many times (I guess the > latter can

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-04 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #38 from Jiu Fu Guo --- (In reply to rguent...@suse.de from comment #37) > On Tue, 2 Jun 2020, guojiufu at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 > > ... > > Unalig

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-07 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #39 from Jiu Fu Guo --- I’m thinking to draft a patch for this optimization. If any suggestions, please point out, thanks.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-06-07 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 Jiu Fu Guo changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-06-08 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #38 from Jiu Fu Guo --- (In reply to Thomas Koenig from comment #37) > (In reply to Jiu Fu Guo from comment #36) > > Will you also backport to gcc 10, the other affected branch? Yes, after it is stable on trunk, then backport to gcc

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-08 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #41 from Jiu Fu Guo --- (In reply to Wilco from comment #40) > (In reply to Jiu Fu Guo from comment #39) > > I’m thinking to draft a patch for this optimization. If any suggestions, > > please point out, thanks. > > Which optimizati

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-08 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #42 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #41) > (In reply to Wilco from comment #40) > > (In reply to Jiu Fu Guo from comment #39) > > > I’m thinking to draft a patch for this optimization. If any suggestions, > >

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-10 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #43 from Jiu Fu Guo --- To handle vectorization for this kind of code, it needs to overcome the hard issue mentioned in comment #5: the loop has 2 exits.

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #45 from Jiu Fu Guo --- (In reply to Wilco from comment #44) > (In reply to Jiu Fu Guo from comment #43) > > To handle vectorization for this kind of code, it needs to overcome the hard > > issue mentioned in comment #5: the loop has

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #46 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #45) > (In reply to Wilco from comment #44) > > (In reply to Jiu Fu Guo from comment #43) > > > To handle vectorization for this kind of code, it needs to overcome the > >

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-06-17 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #47 from Jiu Fu Guo --- memcmp is using wider reading in glibc; strncmp does not use wider reading. memcmp is using "void *" as arguments, while strncmp is "char *".

[Bug tree-optimization/96535] [10/11 Regression] GCC 10 ignoring function __attribute__ optimize for all x86 since r11-1019

2020-08-11 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96535 --- Comment #8 from Jiu Fu Guo --- (In reply to Jakub Jelinek from comment #7) > Created attachment 49043 [details] > gcc11-pr96535.patch > > Updated patch to only move handling of the loop unrolling options (but I > need changes on the rs6000 s

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-10-22 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #41 from Jiu Fu Guo --- for code: subroutine foo (i, i1, block) integer :: i, i1 integer :: block(9, 9, 9) block(i:9,1,i1) = block(i:9,1,i1) - 10 end subroutine foo "-funroll-loops --param max-unroll-times=2 --param

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-10-27 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #42 from Jiu Fu Guo --- Author: guojiufu Date: Mon Oct 28 05:23:24 2019 New Revision: 277501 URL: https://gcc.gnu.org/viewcvs?rev=277501&root=gcc&view=rev Log: rs6000: Enable limited unrolling at -O2 In PR88760, there are a few diss

[Bug target/70010] powerpc: -flto forgets 'no-vsx' function attributes

2019-10-28 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70010 --- Comment #9 from Jiu Fu Guo --- Author: guojiufu Date: Mon Oct 28 09:46:15 2019 New Revision: 277506 URL: https://gcc.gnu.org/viewcvs?rev=277506&root=gcc&view=rev Log: [rs6000] PR70010, avoid no-vsx function to be inlined to vsx function In

[Bug target/70010] powerpc: -flto forgets 'no-vsx' function attributes

2019-10-28 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70010 --- Comment #10 from Jiu Fu Guo --- Author: guojiufu Date: Mon Oct 28 13:55:41 2019 New Revision: 277518 URL: https://gcc.gnu.org/viewcvs?rev=277518&root=gcc&view=rev Log: [rs6000] PR70010, avoid no-vsx function to be inlined to vsx function In

[Bug target/70010] powerpc: -flto forgets 'no-vsx' function attributes

2019-10-28 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70010 --- Comment #11 from Jiu Fu Guo --- Author: guojiufu Date: Mon Oct 28 14:23:26 2019 New Revision: 277521 URL: https://gcc.gnu.org/viewcvs?rev=277521&root=gcc&view=rev Log: Backport from mainline PR target/70010 * gcc.tar

[Bug target/70010] powerpc: -flto forgets 'no-vsx' function attributes

2019-10-28 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70010 --- Comment #12 from Jiu Fu Guo --- Author: guojiufu Date: Mon Oct 28 14:30:05 2019 New Revision: 277523 URL: https://gcc.gnu.org/viewcvs?rev=277523&root=gcc&view=rev Log: Backport from mainline PR target/70010 * gcc.tar

[Bug target/92256] [10 regression] error in gcc.dg/unroll-and-jam.c after r277501

2019-10-29 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92256 --- Comment #2 from Jiu Fu Guo --- Just send out a new patch for review. The new patch will make this case pass too.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-11-10 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #43 from Jiu Fu Guo --- Author: guojiufu Date: Mon Nov 11 06:30:38 2019 New Revision: 278034 URL: https://gcc.gnu.org/viewcvs?rev=278034&root=gcc&view=rev Log: rs6000: Refine small loop unroll in loop_unroll_adjust hook In this pat

[Bug target/92465] [10 regression] r278034 breaks gcc.dg/pr47763.c

2019-11-12 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92465 --- Comment #2 from Jiu Fu Guo --- Author: guojiufu Date: Wed Nov 13 05:04:22 2019 New Revision: 278112 URL: https://gcc.gnu.org/viewcvs?rev=278112&root=gcc&view=rev Log: Add option -fweb for pr47763.c This case is testing 'web' on ignore nake

[Bug target/92465] [10 regression] r278034 breaks gcc.dg/pr47763.c

2019-11-13 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92465 Jiu Fu Guo changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/92256] [10 regression] error in gcc.dg/unroll-and-jam.c after r277501

2019-11-13 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92256 Jiu Fu Guo changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug c/93047] New: frename-registers does not work well with __builtin_return

2019-12-23 Thread guojiufu at gcc dot gnu.org
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: guojiufu at gcc dot gnu.org Target Milestone: --- There is a case builtin-return-1.c which is checking __builtin_return and __builtin_apply. Which this case fail with -frename-registers. It

[Bug target/93047] frename-registers does not work well with __builtin_return

2019-12-23 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93047 --- Comment #2 from Jiu Fu Guo --- Sorry for missing -fpic gcc builtin-return-1.c -O3 -fpic -frename-registers -o ./builtin-return-1.exe and this issue can be reproduced on gcc7.4, gcc6.4 is ok.

[Bug target/93047] frename-registers does not work well with __builtin_return

2020-01-08 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93047 --- Comment #3 from Jiu Fu Guo --- On P9, "gcc $GCC_SRC/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c -O3 -frename-registers -o ./builtin-return-1.exe" could reproduce this issue without -fpic. On P8, to reproduce this issue, -fpic

[Bug target/93047] frename-registers does not work well with __builtin_return

2020-01-08 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93047 --- Comment #4 from Jiu Fu Guo --- Checking dumps, some info like below: Before rnreg, there are insns: 127: call [`foo'] argc 0 242: %0:DI=%31:DI+0x220 128: [%31:DI+0x200]=%3:DI 359: %2:TI=%2:TI<-<0x40 449: %3:DI=%0:DI 360: [%3:DI]=%2:TI<-<0x40

[Bug target/93047] frename-registers does not work well with __builtin_return

2020-02-16 Thread guojiufu at gcc dot gnu.org
||2020-02-17 Assignee|unassigned at gcc dot gnu.org |guojiufu at gcc dot gnu.org Ever confirmed|0 |1

[Bug target/93047] frename-registers does not work well with __builtin_return

2020-02-16 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93047 Jiu Fu Guo changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161

2020-02-20 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 --- Comment #3 from Jiu Fu Guo --- This issue may relates to cunroll and cunrollli; after cunroll, for power9 some special instructions were selected. In RTL, for power9, 'smax' is generated at ce1 pass; While for power8, 'smax' is not used.

[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161

2020-02-20 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 --- Comment #4 from Jiu Fu Guo --- This issue can be reproduced with GCC9 "-O2 -funroll-loops -mcpu=power9" or "-O3 -mcpu=power9".

[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161

2020-02-21 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 --- Comment #5 from Jiu Fu Guo --- There are below difference between data/instructions for P8 and P9: (maxlocval_4.f90) f29=-inf f30=-inf f31=nan P9: xsmaxcdp vs31,vs29,vs31 ==> vs31/f31:nan (smax(-inf, nan)-->nan) b 0x10004b60 P8: f

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-10-10 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-10-12 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #39 from Jiu Fu Guo --- For small loop (1-2 stmts), in forms of GIMPLE and RTL, it would be around 5-10 instructions: 2-4 insns per stmt, ~4 insns for idx. With current unroller, here is a statistic on spec2017. Using --param max-un

[Bug target/70010] powerpc: -flto forgets 'no-vsx' function attributes

2019-10-16 Thread guojiufu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70010 --- Comment #7 from Jiu Fu Guo --- Author: guojiufu Date: Wed Oct 16 13:35:41 2019 New Revision: 277065 URL: https://gcc.gnu.org/viewcvs?rev=277065&root=gcc&view=rev Log: In PR70010, a function is marked with target(no-vsx) to disable VSX code g

[Bug rtl-optimization/66552] Missed optimization when shift amount is result of signed modulus

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
||guojiufu at gcc dot gnu.org Resolution|--- |FIXED --- Comment #16 from Jiu Fu Guo --- Just confirmed the fix is ready in the trunk.

[Bug rtl-optimization/66706] Redundant bitmask instruction on x >> (n & 32)

2020-10-20 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66706 Bug 66706 depends on bug 66552, which changed state. Bug 66552 Summary: Missed optimization when shift amount is result of signed modulus https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66552 What|Removed |Added -

[Bug tree-optimization/97901] ICE at -Os: verify_gimple failed

2020-11-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97901 --- Comment #4 from Jiu Fu Guo --- Hi Richard, thank you to handle this!

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #11 from Jiu Fu Guo --- And the patch(PR98137) also helps a lot for the code in comment 9, since vectorization happens.

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2020-12-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #13 from Jiu Fu Guo --- Hi Richard, As checking the changed code as in comment 9, it seems there is another opportunity to improve the performance: By improving locality of array A usage. Unroll and jam loop1 into loop4 (or unroll

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2021-01-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #15 from Jiu Fu Guo --- (In reply to Richard Biener from comment #14) > > I've only quickly tried to understand what you are proposing but I think > this is out-of scope of our "separate" distribution / interchange / > unroll-and-jam

[Bug tree-optimization/98813] New: loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: guojiufu at gcc dot gnu.org Target Milestone: --- For the below code: ---t.c void foo (const double* __restrict__ A, const double* __restrict__ B, double* __restrict__ C

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #1 from Jiu Fu Guo --- Since there are additional costs for the run-time check, we can see the benefit if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code (from clang) is worse than un-vectorized binary.

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #2 from Jiu Fu Guo --- For code: for (unsigned int k = 0; k < BS; k++) { s += A[k] * B[k]; } PR48052 handles this, and for this code, the additional runtime check seems not required. If there is offset in code: f

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #4 from Jiu Fu Guo --- Thanks, Richard! One interesting thing: below code is vectorized: void foo (const double *__restrict__ A, const double *__restrict__ B, double *__restrict__ C, int n, int k, int m) { if (n > 0 && m > 0

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #7 from Jiu Fu Guo --- (In reply to Richard Biener from comment #6) > (In reply to Andrew Pinski from comment #5) > > (In reply to Jiu Fu Guo from comment #0) > > > For the below code: > > > ---t.c > > > void > > > foo (const doub

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #8 from Jiu Fu Guo --- For code in comment 4, it is optimized since there are some range info for "_2 = l_m_34 + _54;" where _54 > 0.

[Bug ipa/100513] New: ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: guojiufu at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- With --with-build-config=bootstrap-O3, I encounter an ICE in bootstrap on ppc64le

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #1 from Jiu Fu Guo --- The error is raised after ipa “inlining” pass, when doing ggc_collect at stage 2. At code: xlimit = ((*xlimit).next); The value of xlimit becomes 0xa5a5a5a5a5a5a5a5 before crash. 0xa5 may comes from poison_pa

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #2 from Jiu Fu Guo --- There is a similar bug fixed for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447. it may be a different issue.

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #4 from Jiu Fu Guo --- Created attachment 50787 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50787&action=edit t.ii /home/guojiufu/gcc/build/gcc-mainline-test/./prev-gcc/xg++ -B/home/guojiufu/gcc/build/gcc-mainline-test/./pr

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #5 from Jiu Fu Guo --- build command is: configure --enable-languages=c,c++,fortran,objc,obj-c++,go --with-cpu=native --disable-multilib --with-long-double-128 --prefix=$HOME/xx --with-build-config=bootstrap-O3 make -j

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #6 from Jiu Fu Guo --- cut.. > 0xa5a5a5a5a5a5a5a means the location has been GC'ed already; either from > ggc_free or from a previous ggc_collect. > What you can try is run with the following options: > --param ggc-min-expand=1 --par

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #7 from Jiu Fu Guo --- A similar issue also reported on X86 before, https://gcc.gnu.org/pipermail/gcc-testresults/2021-April/677996.html While when I bootstrap -O3 on one x86, it passes.

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #20 from Jiu Fu Guo --- Yes, with the patch, bootstrap-O3 pass on ppc64le too. Thanks!

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #21 from Jiu Fu Guo --- When build the go on trunk with the patch, an error occur: In function 'syscall.forkExec': go1: error: address taken, but ADDRESSABLE bit not set PHI argument &go..C479; for PHI node err$__object_77 = PHI dur

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #23 from Jiu Fu Guo --- Created attachment 50791 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50791&action=edit the command to build syscall.o

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513 --- Comment #24 from Jiu Fu Guo --- (In reply to rguent...@suse.de from comment #22) > On Tue, 11 May 2021, guojiufu at gcc dot gnu.org wrote: > cut.. > > Makefile:3001: recipe for target 'syscall.lo' failed > &g

[Bug middle-end/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #5 from Jiu Fu Guo --- breakpoint at tree-ssa.c:1013 error ("address taken, but ADDRESSABLE bit not set"); if ((VAR_P (base) || TREE_CODE (base) == PARM_DECL || TREE_CODE (base) == RESU

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #6 from Jiu Fu Guo --- As Richard mentioned: one does mark the object addressable. Which is for 'label' (Gcc_backend::label_address). I'm wondering if all others invoking on build_fold_addr_expr_loc need to mark addressable?

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #9 from Jiu Fu Guo --- Yes, diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc index 5d9dbb5d068..32637a44af1 100644 --- a/gcc/go/go-gcc.cc +++ b/gcc/go/go-gcc.cc @@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression* bexp

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #11 from Jiu Fu Guo --- Had a quick regression test on the patch: issue4458.go which pass before, but fail on this patch. Compiling message changed from "error: method expression requires named type or pointer to named type" to "erro

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #13 from Jiu Fu Guo --- (In reply to Ian Lance Taylor from comment #12) > A change to go-gcc.cc should not change any of the error messages emitted by > the Go frontend. It should not change the way that issue4458.go is handled. > T

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #14 from Jiu Fu Guo --- Update/correct info: If bootstrap-O3, the message is "error: method 'foo' is ambiguous". It is "error: type has no method 'foo'".

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537 --- Comment #15 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #9) > Yes, > > diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc > index 5d9dbb5d068..32637a44af1 100644 > --- a/gcc/go/go-gcc.cc > +++ b/gcc/go/go-gcc.cc > @@ -1680,6 +168

[Bug target/59371] [9/10/11/12 Regression] Performance regression in GCC 4.8/9/10/11/12 and later versions.

2021-05-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug target/59371] [9/10/11/12 Regression] Performance regression in GCC 4.8/9/10/11/12 and later versions.

2021-05-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371 --- Comment #28 from Jiu Fu Guo --- If change code as below, 'i' is not starting from '0', and 'compare code' is '!=' then wrap/overflow on 'i' may happen, and optimizations (e.g. vectorization) are not applied. The below patch is trying to optim

[Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop

2021-06-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
|RESOLVED CC||guojiufu at gcc dot gnu.org --- Comment #6 from Jiu Fu Guo --- Had a test, this issue has been fixed on the trunk by r12-1202.

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #3 from Jiu Fu Guo --- Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky: /* Only support simple cases for the moment. */ if (TREE_CODE (iv0->base) != INTEGER_CST || TREE_CODE (iv1->base) != INTE

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #5 from Jiu Fu Guo --- (In reply to bin cheng from comment #4) > (In reply to Jiu Fu Guo from comment #3) > > Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky: > > > > /* Only support simple cases for th

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #6 from Jiu Fu Guo --- > As tests, for below loop, adjust_cond_for_loop_until_wrap return false: > > foo (int *__restrict__ a, int *__restrict__ b, unsigned i) > { > while (++i > 100) > *a++ = *b++ + 1; > } For the above code,

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2023-01-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743 Jiu Fu Guo changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/108338] New: use mtvsrws for lowpart DI->SF conversion on P9

2023-01-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
ent: target Assignee: unassigned at gcc dot gnu.org Reporter: guojiufu at gcc dot gnu.org Target Milestone: --- In a mail-list discussion, https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609054.html, as Segher points out, we could use 'mtvsrws' for the co

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-14 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743 Jiu Fu Guo changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |guojiufu at gcc dot gnu.org

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743 --- Comment #5 from Jiu Fu Guo --- It would be also ok for the constant that only has 16bits in the middle: e.g. 0x09876000ULL, we can rotate the constant to 0x9876.

[Bug preprocessor/101168] gnu++14 complains about altivec types defined with using keyword in the same file with preprocessor macros

2022-03-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101168 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment

  1   2   3   >