from:"changpeng dot fang at amd dot com"

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com

--- Comment #12 from changpeng dot fang at amd dot com 2010-08-30 16:41 --- Fixed! -- changpeng dot fang at amd dot com changed: What|Removed |Added Status

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com

--- Comment #11 from changpeng dot fang at amd dot com 2010-08-30 16:40 --- r163286 - in /branches/gcc-4_5-branch/gcc: Chan... * From: cfang at gcc dot gnu dot org * To: gcc-cvs at gcc dot gnu dot org * Date: Mon, 16 Aug 2010 21:02:30 - * Subject: r163286 - in

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com

--- Comment #10 from changpeng dot fang at amd dot com 2010-08-30 16:39 --- r163207 - in /trunk/gcc: ChangeLog testsuite/Ch... * From: cfang at gcc dot gnu dot org * To: gcc-cvs at gcc dot gnu dot org * Date: Thu, 12 Aug 2010 22:18:34 - * Subject: r163207 - in

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com

--- Comment #9 from changpeng dot fang at amd dot com 2010-08-30 16:37 --- Review approval for the trunk: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00931.html Review Approval for 4.5 branch: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02112.html -- http://gcc.gnu.org/bugzilla

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-24 Thread changpeng dot fang at amd dot com

--- Comment #5 from changpeng dot fang at amd dot com 2010-08-24 22:13 --- For the test case in comment #2, if we don't vectorize the loop, the unroll_factor is incorrectly determined as 1, and insns-to-prefetch ratio (4) will then prevent prefetching, and thus no perfor

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-08-24 00:46 --- Ooops, the open64 generated code posted in last comment is for non-vectorized loop, the vectorized one is similar: .LBB23_f: .loc1 7 0 movups 0(%r10),%xmm3# [0] id:65

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-08-24 00:22 --- I checked with open64 and did not find any regression. And for the above testcase, open64 generated 3 non-temporal prefetches. As a result, I am guessing that we are just unlucky that the prefetch kicks out

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-08-24 00:03 --- float f (float *x, float *y, float *z, unsigned n) { float ret = 0.0; unsigned i; for (i = 0; i < n; i++) { float diff = x[i] - y[i]; ret -= diff * diff * z[i]; } return ret; }

[Bug target/45391] New: CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com

prefetching of vectorized loop Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http

[Bug c/45390] New: CPU2006 434.zeusmp: gcc 4.6 7% regression from gcc 4.6

2010-08-23 Thread changpeng dot fang at amd dot com

UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45390

[Bug c/45389] New: CPU2006 cactusADM: gcc 4.6 15% regression from 4.5

2010-08-23 Thread changpeng dot fang at amd dot com

AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45389

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-23 Thread changpeng dot fang at amd dot com

--- Comment #6 from changpeng dot fang at amd dot com 2010-08-23 18:59 --- Committed to trunk as Revision: 163475: http://gcc.gnu.org/ml/gcc-cvs/2010-08/msg00688.html Committed to 4.5 branch as Revision: 163483 http://gcc.gnu.org/ml/gcc-cvs/2010-08/msg00696.html -- http

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-20 Thread changpeng dot fang at amd dot com

--- Comment #5 from changpeng dot fang at amd dot com 2010-08-20 22:48 --- I have a fix: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01625.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45260

[Bug c++/45269] CPU2006 450.soplex: "verify_cgraph_node failed" with -fprofile-generate

2010-08-18 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-08-18 19:43 --- http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00406.html Verified. If I back out the above change, the bug goes away. So it is a duplicate of bug 44206 *** This bug has been marked as a duplicate of 44206

[Bug middle-end/44206] [4.6 Regression] ICE: Inline clone with address taken

2010-08-18 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-08-18 19:43 --- *** Bug 45269 has been marked as a duplicate of this bug. *** -- changpeng dot fang at amd dot com changed: What|Removed |Added

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-16 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-08-16 22:39 --- This bug should be related to VIEW_CONVERT_EXPR. If I use the following statement to filter the prefetch, the bug will go away: if (contains_view_convert_expr_p (ref)) return false; Otherwise, the

[Bug c/45270] New: CPU2006 435.gromacs: Segmentation fault with -fprofile-generate

2010-08-12 Thread changpeng dot fang at amd dot com

at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45270

[Bug c++/45269] New: CPU2006 450.soplex: "verify_cgraph_node failed" with -fprofile-generate

2010-08-12 Thread changpeng dot fang at amd dot com

06 450.soplex: "verify_cgraph_node failed" with - fprofile-generate Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45269

[Bug c/45268] New: CPU2006 458.sjeng: type mismatch in array reference with -fwhole-program -combine

2010-08-12 Thread changpeng dot fang at amd dot com

gram -combine Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-11 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-08-12 00:38 --- (In reply to comment #2) > It was caused by revision 153878: > > http://gcc.gnu.org/ml/gcc-cvs/2009-11/msg00094.html > I think the same patch was also committed to 4.4 branch. Maybe some prefetc

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-10 Thread changpeng dot fang at amd dot com

--- Comment #7 from changpeng dot fang at amd dot com 2010-08-10 21:44 --- (In reply to comment #5) > (In reply to comment #1) > > This patch should be a valid fix, because the recognition of the dot_prod > > pattern is known to be fail at this point if the stmt is o

[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com

--- Comment #1 from changpeng dot fang at amd dot com 2010-08-09 17:52 --- This patch should be a valid fix, because the recognition of the dot_prod pattern is known to be fail at this point if the stmt is outside the loop. (I am not sure whether we should not see this case in the

[Bug tree-optimization/45241] New: CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com

ssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241

[Bug tree-optimization/45239] New: CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com

ssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45239

[Bug tree-optimization/45022] No prefetch for the vectorized loop

2010-07-29 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-07-29 19:14 --- (In reply to comment #1) > The misaligned indirect-refs will vanish soon. > I saw your patch that remove ALIGNED_INDIRECT_REF. Do you also plan to remove MISALIGNED_INDIRECT_REF? Thanks. --

[Bug tree-optimization/45021] Redundant prefetches for some loops (vectorizer produced ones too)

2010-07-28 Thread changpeng dot fang at amd dot com

--- Comment #5 from changpeng dot fang at amd dot com 2010-07-28 18:28 --- Thing is a little complicate if we change the code to: a[i] = a[i+1] + beta * b[i]; The prefetch pass want to group a[i] and a[i+1], i.e. they have the same base address with an offset of 4 bytes. -- http

[Bug tree-optimization/45021] Redundant prefetches for some loops (vectorizer produced ones too)

2010-07-28 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-07-28 18:22 --- Andrew's example is exactly what the prefetch sees for the test case (in the bug description). Unfortunately, the prefetch pass could not recognize that vect_pa.6_24 and vect_pa.20_38 are exactly the

[Bug tree-optimization/45022] No prefetch for the vectorized loop

2010-07-22 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-07-22 20:52 --- (In reply to comment #1) > The misaligned indirect-refs will vanish soon. > >From the prefetching point of view, is there any reason that we can not prefetch for mis-aligned or indirect refs? -

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-07-21 Thread changpeng dot fang at amd dot com

--- Comment #23 from changpeng dot fang at amd dot com 2010-07-21 21:30 --- Fixed -- changpeng dot fang at amd dot com changed: What|Removed |Added Status

[Bug tree-optimization/45021] Redundant prefetches for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com

--- Comment #1 from changpeng dot fang at amd dot com 2010-07-21 18:26 --- The direct reason is that prefetching could not differentiate the base addresses of the vectorized load and store (of a[i]): *vect_pa.6_24 *vect_pa.19_37 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021

[Bug tree-optimization/45022] New: No prefetch for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com

P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45022

[Bug tree-optimization/45021] New: Redundant prefetches for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com

Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021

[Bug tree-optimization/44955] over-prefetched for arrays of complex number

2010-07-15 Thread changpeng dot fang at amd dot com

--- Comment #1 from changpeng dot fang at amd dot com 2010-07-15 17:20 --- This is a piece of code that shows the two prefetches for b. mulss %xmm4, %xmm5 addq$8, %rdx prefetcht0 96(%r11) prefetcht0 100(%r11) subss %xmm2, %xmm1

[Bug tree-optimization/44955] New: over-prefetched for arrays of complex number

2010-07-15 Thread changpeng dot fang at amd dot com

at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44955

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-14 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-07-15 01:50 --- Created an attachment (id=21205) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21205&action=view) Do not unroll pre and post loops I did a quick test on polyhedron before and after apply

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-08 Thread changpeng dot fang at amd dot com

--- Comment #20 from changpeng dot fang at amd dot com 2010-07-09 01:59 --- I submitted a patch for review to completely fix the problem. The patch is an extension to Christian's speedup.patch. It splits the cost analysis into three small functions and quits further prefet

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-07 Thread changpeng dot fang at amd dot com

--- Comment #19 from changpeng dot fang at amd dot com 2010-07-07 19:00 --- (In reply to comment #18) > Changpeng, should this PR be closed now? > No. I am still looking at the dependence computation cost. I just found the most of the time is spent in memory allocation and free

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-06 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-07-06 18:35 --- Here is the impact of loop unrolling on the compilation time and code size on polyhedron test_fpu.f90: -O3 -ftree-vectorize -fno-prefetch-loop-arrays -fno-unroll-loops: timing: 12.62s, size: 67069 bytes -O3

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-06 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-07-06 17:58 --- We also need to handle the post loop of unrolling. Suppose the unroll_factor is 16, then the post-loop should have up to 15 iterations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-02 Thread changpeng dot fang at amd dot com

--- Comment #17 from changpeng dot fang at amd dot com 2010-07-02 23:58 --- (In reply to comment #15) I have opened PR44794 for the unrolling of pre- and post-loop issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576

[Bug tree-optimization/44794] New: pre- and post-loops should not be unrolled.

2010-07-02 Thread changpeng dot fang at amd dot com

dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-30 Thread changpeng dot fang at amd dot com

--- Comment #15 from changpeng dot fang at amd dot com 2010-07-01 00:34 --- Unrolling of the peeled loop is partially the reason for test_fpu.f90 compilation time and code size increase. Vectorization peeled a few iteration of the the loop, the prefetching and unrolling passes does not

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-29 Thread changpeng dot fang at amd dot com

--- Comment #14 from changpeng dot fang at amd dot com 2010-06-30 00:36 --- (In reply to comment #7) > A good chunk of time seems to be spent in the RTL loop unroller, triggered > by array prefetching (testing with -O3 -funroll-loops). Otherwise it might > as well be just

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-29 Thread changpeng dot fang at amd dot com

--- Comment #13 from changpeng dot fang at amd dot com 2010-06-30 00:23 --- Here is the current status of this work: patch1: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html patch2: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg03049.html On my system with -O3 zero_sized_1.f90

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-28 Thread changpeng dot fang at amd dot com

--- Comment #12 from changpeng dot fang at amd dot com 2010-06-29 00:49 --- Created an attachment (id=21034) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21034&action=view) Early return in miss rate computation The attached patch improves the computation of miss rate.

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-28 Thread changpeng dot fang at amd dot com

--- Comment #11 from changpeng dot fang at amd dot com 2010-06-29 00:07 --- I have a patch that partially fixes the problem: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html Note that for this test case, the compile time doubled even though I don't compute the miss rate a

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-25 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-06-25 17:08 --- (In reply to comment #3) > Created an attachment (id=21001) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001&action=view) [edit] > Potential fix for compile time regression > > Here

[Bug tree-optimization/44503] "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-06-14 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-06-14 22:22 --- There is nothing wrong in the prefetch itself. The problem is __builtin_prefetch call used for prefetch instruction. Whenever, there is a non-local lable in the current function, the __builtin_prefetch

[Bug tree-optimization/44503] "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-06-14 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-06-14 18:28 --- Actually, the prefetching is for the following loop: for (i = 0; i < p[2]; i++) q[i] = 0; I do not understand why unrolling of this loop affects other part of the program that has long

[Bug tree-optimization/44503] "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-06-11 18:45 --- Bug 39398 looks similar but that one seems with except handling instead of setjmp. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44503

[Bug c/44503] "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com

--- Comment #1 from changpeng dot fang at amd dot com 2010-06-11 16:32 --- Created an attachment (id=20894) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20894&action=view) prefetching for the while loop? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44503

[Bug c/44503] New: "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com

oop-arrays Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/sh

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-08 Thread changpeng dot fang at amd dot com

--- Comment #21 from changpeng dot fang at amd dot com 2010-06-08 16:23 --- Just for the record, non-constant step prefetching improves 459.GemsFDTD by 5.5% (under -O3 + prefetch) on amd-linux64 systems. And the gains are from the following set of loops: NFT.fppized.f90:1268

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com

--- Comment #19 from changpeng dot fang at amd dot com 2010-06-07 22:30 --- Created an attachment (id=20862) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20862&action=view) Account prefetch_mod and unroll_factor for the computation of the prefetch count Ooops. Attached

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com

--- Comment #17 from changpeng dot fang at amd dot com 2010-06-07 18:37 --- (In reply to comment #15) > Created an attachment (id=20860) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860&action=view) [edit] > Don't consider effect of unrolling in the comp

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com

--- Comment #16 from changpeng dot fang at amd dot com 2010-06-07 18:32 --- Created an attachment (id=20861) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20861&action=view) Limit non-constant step prefetching only to the innermost loops -- http://gcc.gnu.org/b

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com

--- Comment #15 from changpeng dot fang at amd dot com 2010-06-07 18:30 --- Created an attachment (id=20860) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860&action=view) Don't consider effect of unrolling in the computation of insn-to-prefetch ratio -- http:/

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com

--- Comment #14 from changpeng dot fang at amd dot com 2010-06-07 18:27 --- Here is the current status of my investigation: (1) 465.tonto regression (~9%): The regressions mainly comes from loops which have array references with both constant (prefetch_mod = 8) and non-constant

[Bug tree-optimization/43529] G++ doesn't optimize away empty loop when index is a double

2010-06-04 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-06-04 23:29 --- (In reply to comment #2) > Interesting! What's the difference between 17 and 18? > > int main() > { > double i; > for(i=0; i<18; i+=1); /* gcc -O3, empty loop not

[Bug tree-optimization/43529] G++ doesn't optimize away empty loop when index is a double

2010-06-04 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-06-04 23:15 --- Interesting! What's the difference between 17 and 18? int main() { double i; for(i=0; i<18; i+=1); /* gcc -O3, empty loop not removed */ } int main() { double i; fo

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-01 Thread changpeng dot fang at amd dot com

--- Comment #13 from changpeng dot fang at amd dot com 2010-06-01 19:59 --- (In reply to comment #12) > Ok. So I will let you continue to look into that and wait for your results? > > Do you have any feedback on separate.patch and its influence on performance? > + f

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-01 Thread changpeng dot fang at amd dot com

--- Comment #11 from changpeng dot fang at amd dot com 2010-06-01 17:40 --- (In reply to comment #10) > Created an attachment (id=20783) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783&action=view) [edit] > experimental patch to have separa

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com

--- Comment #9 from changpeng dot fang at amd dot com 2010-05-28 18:36 --- (In reply to comment #8) > Looks like this is a fix to the regressions. That is, the regressions are > actually caused by the wrong calculation. This bug could be considered fixed, > even though pe

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com

--- Comment #8 from changpeng dot fang at amd dot com 2010-05-28 18:30 --- (In reply to comment #4) > Created an attachment (id=20767) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767&action=view) [edit] > Patch that makes loop invariant prefetches backend specf

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com

--- Comment #7 from changpeng dot fang at amd dot com 2010-05-28 16:56 --- (In reply to comment #5) > An alternative approach might be have different values for > prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio > depending on constant/non-constant step size. >

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com

--- Comment #6 from changpeng dot fang at amd dot com 2010-05-28 16:46 --- (In reply to comment #4) > Created an attachment (id=20767) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767&action=view) [edit] > Patch that makes loop invariant prefetches backend specfic

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-05-27 23:51 --- I did a quick look at 434.zeusmp and found that prefetching for the following simple loop is responsible: linpck.f: 131: c ccode for increment not equal to 1 c ix = 1 smax = abs(sx(1

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-05-27 20:55 --- To me, non-constant step prefetching seems not fit into the existing prefetching framework. non-constant stride prevent any reuse analysis, and thus prefetching is kind of blindly. -- http://gcc.gnu.org

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com

--- Comment #1 from changpeng dot fang at amd dot com 2010-05-27 20:49 --- The regressions are most likely from the patch that added non-constant step prefetching: * From: Andreas Krebbel * To: Christian Borntraeger * Cc: gcc-patches * Date: Wed, 19 May 2010 12:40

[Bug middle-end/44297] New: Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com

dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297

[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion

2010-05-24 Thread changpeng dot fang at amd dot com

--- Comment #9 from changpeng dot fang at amd dot com 2010-05-24 22:47 --- (In reply to comment #8) > -fgraphite-identity does iteration splitting for this case. Do you know why it could not be vectorized after iteration range splitting? -- http://gcc.gnu.org/bugzi

[Bug middle-end/44185] [4.6 regression] New prefetch test failures

2010-05-21 Thread changpeng dot fang at amd dot com

--- Comment #6 from changpeng dot fang at amd dot com 2010-05-21 21:36 --- (In reply to comment #5) > The fix introduced: > > FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-assembler-times movnti 18 > FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-tree-dump-times optimized "={nt}&

[Bug middle-end/44185] [4.6 regression] New prefetch test failures

2010-05-18 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-05-18 19:39 --- I have a patch to fix the test cases: http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01359.html For prefetch-6.c, patch http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00567.html applies the insn to prefetch ratio

[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion

2010-05-07 Thread changpeng dot fang at amd dot com

--- Comment #7 from changpeng dot fang at amd dot com 2010-05-07 21:41 --- (In reply to comment #4) > (In reply to comment #3) > > Subject: Re: gcc should vectorize this loop > > through "iteration range splitting" > > You mean that the prob

[Bug tree-optimization/43425] gcc should vectorize this loop by substitution

2010-05-07 Thread changpeng dot fang at amd dot com

--- Comment #3 from changpeng dot fang at amd dot com 2010-05-07 21:33 --- I just found that the test case in the same as (similar to) bug 35229. The subject of this bug is wrong. Scalar expansion is not appropriate for this case. Actually the loop can be transform to: void foo(int n

[Bug tree-optimization/43543] New: Reorder the statements in the loop can vectorize it

2010-03-26 Thread changpeng dot fang at amd dot com

5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43543

[Bug tree-optimization/43428] New: vectorizer should invoke loop distribution to partially vectorize this loop

2010-03-18 Thread changpeng dot fang at amd dot com

chf...@pathscale:~/gcc$ cat foo.c float a[100], b[100], c[100]; void foo(int n) { int i; for(i=1; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43428

[Bug c/43427] New: The loop is not interchanged and thus could not be vectorized.

2010-03-18 Thread changpeng dot fang at amd dot com

chf...@pathscale:~/gcc$ cat foo.c float a[100][100], b[100][100]; void foo(int n) { int i, j; for(j=0; jhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43427

[Bug c/43425] New: enhance scalar expansion to vectorize this loop

2010-03-18 Thread changpeng dot fang at amd dot com

chf...@pathscale:~/gcc$ cat foo.c int a[100], b[100]; void foo(int n, int mid) { int i, t = 0; for(i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43425

[Bug c/43423] New: gcc should vectorize this loop through "iteration range splitting"

2010-03-18 Thread changpeng dot fang at amd dot com

chf...@pathscale:~/gcc$ cat foo.c int a[100], b[100], c[100]; void foo(int n, int mid) { int i; for(i=0; ihttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423

[Bug c/43422] New: reversed loop is not vectorized

2010-03-18 Thread changpeng dot fang at amd dot com

MED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43422

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-18 Thread changpeng dot fang at amd dot com

--- Comment #20 from changpeng dot fang at amd dot com 2010-03-18 17:24 --- (In reply to comment #19) > Splitting critical edges for CDDCE will probably also solve this problem. > > Richard. > Yes, splitting critical edges is an enhancement to CDDCE and can solve this pr

[Bug tree-optimization/32824] Missed reduction vectorizer after store to global is LIM'd

2010-03-17 Thread changpeng dot fang at amd dot com

--- Comment #8 from changpeng dot fang at amd dot com 2010-03-17 21:22 --- Created an attachment (id=20133) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20133&action=view) patch with the testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32824

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-16 Thread changpeng dot fang at amd dot com

--- Comment #18 from changpeng dot fang at amd dot com 2010-03-17 00:22 --- (In reply to comment #16) > > In this case, the loop itself is "empty" and we can replace every use of the > > phi with "n" (exit value of the iv). > > I don't thin

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-16 Thread changpeng dot fang at amd dot com

--- Comment #17 from changpeng dot fang at amd dot com 2010-03-17 00:18 --- (In reply to comment #8) > And > > int foo (int b, int j) > { > if (b) > { > int i; > for (i = 0; i<1000; ++i) > ; > j = b; > } > r

[Bug middle-end/43238] GCC 4.5 ICE segfault on any -O flag

2010-03-02 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-03-02 21:56 --- I have verified that the patch proposed in bug 43209 did fix this problem. I am going to checkin the change soon. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43238

[Bug tree-optimization/43209] [4.5 Regression] ICE in try_improve_iv_set, at tree-ssa-loop-ivopts.c:5238

2010-03-01 Thread changpeng dot fang at amd dot com

--- Comment #5 from changpeng dot fang at amd dot com 2010-03-01 18:02 --- I have a fix for this problem. We should not decrease the cost if the cost is infinite. diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 74dadf7..9accda9 100644 --- a/gcc/tree-ssa-loop

[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]

2010-02-26 Thread changpeng dot fang at amd dot com

--- Comment #6 from changpeng dot fang at amd dot com 2010-02-26 19:06 --- > > Actually it is a totally different case. Please file a new bug with that > case; > though there might already be a bug about that one. > I could not see the difference even though j i

[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]

2010-02-26 Thread changpeng dot fang at amd dot com

--- Comment #4 from changpeng dot fang at amd dot com 2010-02-26 18:53 --- Here is another similar case but more general. We know that a(j) and a(i) never access the same memory location. intel ifort can vectorize this triangular loop: do 10 j = 1,n do 20 i = j+1, n

[Bug middle-end/43184] gcc could not vectorize floating point reduction statements

2010-02-25 Thread changpeng dot fang at amd dot com

--- Comment #2 from changpeng dot fang at amd dot com 2010-02-26 00:28 --- Subject: RE: gcc could not vectorize floating point reduction statements Thanks for pointing this out. Actually I am working on a fortran program and found the the reduction statement. The fortran code can

[Bug middle-end/43184] New: gcc could not vectorize floating point reduction statements

2010-02-25 Thread changpeng dot fang at amd dot com

Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43184

[Bug middle-end/43182] New: gcc could not vectorize this simple loop (un-handled data-ref)

2010-02-25 Thread changpeng dot fang at amd dot com

Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-02-16 Thread changpeng dot fang at amd dot com

--- Comment #15 from changpeng dot fang at amd dot com 2010-02-16 19:54 --- Hello, I am not sure whether CD-DCE can fully replace remove_empty_loop. However, I would prefer to keep remove_empty_loop pass. There are two reasons for this proposal: (1) remove_empty_loop was at level -O1

93 matches

Mail list logo