[Bug tree-optimization/38258] New: IV-opts creating poor code for loop exit test
Noticed the following in 200.sixtrack benchmark from CPU2000. SUBROUTINE DFACT(N,A,IDIN,IR,IFAIL,DET,JFAIL) IMPLICIT REAL*8 (A-H,O-Z) INTEGER IR(9),IPAIRF REAL*8A(IDIN,9),DET, ZERO, ONE,X,Y,TF DO 144J = 1, N K = IR(5) 123 DO 124L = 1, N TF = A(J,L) A(J,L) = A(K,L) A(K,L) = TF 124CONTINUE 144 CONTINUE RETURN END Compiling with -O2 -m64 results in poor code for the outer loop exit test (-m32 is fine, as are 4.3 and earlier versions). 4.3: ... bdnz .L4 #inner loop exit cmpw 7,7,3 addi 11,11,1 beqlr 7 mainline: ... bdnz .L4 #inner loop exit addi 11,8,-1 addi 3,3,1 rldicl 11,11,0,32 addi 11,11,2 add 11,12,11 add 11,11,0 cmpd 7,3,11 bne 7,.L5 -- Summary: IV-opts creating poor code for loop exit test Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38258
[Bug tree-optimization/38258] [4.4 regression] IV-opts creating poor code for loop exit test
--- Comment #2 from pthaugen at gcc dot gnu dot org 2008-11-24 22:28 --- Working fine after updating my source. Must have been fixed within the past week. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38258
[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817
-- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC||pthaugen at gcc dot gnu dot ||org Priority|P2 |P3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
[Bug middle-end/40281] -fprefetch-loop-arrays: ICE: in initialize_matrix_A, at tree-data-ref.c:1887
--- Comment #2 from pthaugen at gcc dot gnu dot org 2009-06-19 14:41 --- Created an attachment (id=18024) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18024&action=view) testcase The attatched testcase is from cpu2006 component 464.h264ref and appears to fail in the same manner on PowerPC. work/spec_err> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O2 -fprefetch-loop-arrays q_matrix.c q_matrix.c: In function #ParseMatrix#: q_matrix.c:5:6: internal compiler error: in initialize_matrix_A, at tree-data-ref.c:1896 Please submit a full bug report, with preprocessed source if appropriate. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40281
[Bug tree-optimization/40496] New: ICE in verify_stmts with -fprefetch-loop-arrays
Current trunk ICE's as follows when -fprefetch-loop-arrays is specified. The testcase that will be attatched is trimmed down from cpu2006 benchmark 483.xalancbmk. work/spec_err> /home/pthaugen/install/gcc/trunk/bin/g++ -c -m64 -O2 -fprefetch-loop-arrays DOMString.cpp DOMString.cpp: In static member function #static void* DOMStringHandle::operator new(size_t)#: DOMString.cpp:26:9: error: Incompatible types in PHI argument 0 struct DOMStringHandle * void * freeListPtr.3_44 = PHI DOMString.cpp:26:9: internal compiler error: verify_stmts failed Please submit a full bug report, with preprocessed source if appropriate. -- Summary: ICE in verify_stmts with -fprefetch-loop-arrays Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40496
[Bug tree-optimization/40496] ICE in verify_stmts with -fprefetch-loop-arrays
--- Comment #1 from pthaugen at gcc dot gnu dot org 2009-06-19 21:32 --- Created an attachment (id=18026) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18026&action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40496
[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817
--- Comment #20 from pthaugen at gcc dot gnu dot org 2009-07-08 21:53 --- Created an attachment (id=18165) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18165&action=view) Reduced testcase The attatched testcase exhibits the problem with the load-hit-store. It's resulting from choosing a bad register class (GENERAL_REGS) for a pseudo that should get assigned to FLOAT_REGS. Since there is no FPR -> GPR move for -mcpu=power6 the copy must go through memory. I compiled the testcase with -m64 -O3 -mcpu=power6 using trunk revision 149376. The pseudo in question is 361. Following are the 3 insns referencing reg 361 in the sched1 dump (before ira): (insn 51 238 241 8 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ]) (reg:DF 358 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil)) ... (insn 47 46 231 9 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ]) (reg:DF 179 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil)) ... (insn 196 194 198 11 thin6d_reduced.f:169 (set (mem/c/i:DF (plus:DI (reg/f:DI 477) (const_int 56 [0x38])) [2 crkve+0 S8 A64]) (reg:DF 361 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (expr_list:REG_DEAD (reg:DF 361 [ prephitmp.35 ]) (nil))) And from the ira dump: Pass1 cost computation: a71 (r361,l1) best GENERAL_REGS, cover GENERAL_REGS a3 (r361,l0) best GENERAL_REGS, cover GENERAL_REGS a3(r361,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 LINK_REGS:156,1836 CTR_REGS:156,1836 SPECIAL_REGS:156,1836 MEM:156 a71(r361,l1) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 LINK_REGS:1680,1680 CTR_REGS:1680,1680 SPECIAL_REGS:1680,1680 MEM:1120 Pass 2 cost computation: r361: preferred GENERAL_REGS, alternative NO_REGS a3(r361,l0) costs: BASE_REGS:0,2240 GENERAL_REGS:0,2240 FLOAT_REGS:312,2552 LINK_REGS:234,4154 CTR_REGS:234,4154 SPECIAL_REGS:234,4154 MEM:156 a71(r361,l1) costs: BASE_REGS:2240,2240 GENERAL_REGS:2240,2240 FLOAT_REGS:2240,2240 LINK_REGS:3920,3920 CTR_REGS:3920,3920 SPECIAL_REGS:3920,3920 MEM:3360 Not sure what's causing the FLOAT cost to be higher than the GENERAL cost yet. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817
--- Comment #22 from pthaugen at gcc dot gnu dot org 2009-07-14 21:15 --- The original problem, multi-block loop preventing movement of loads, was reintroduced with revision 149206, Jan's CD-DCE patch to remove empty loops. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC||jh at suse dot cz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
[Bug rtl-optimization/40842] New: Poor register class choice in IRA
Moving this issue from bz 39976 since it is a separate problem than the original documented there. Verified the behavior still exists using current trunk revision (150020). The testcase comes from cpu2000 sixtrack benchmark. Following is original comment I posted: === The attatched testcase exhibits the problem with the load-hit-store. It's resulting from choosing a bad register class (GENERAL_REGS) for a pseudo that should get assigned to FLOAT_REGS. Since there is no FPR -> GPR move for -mcpu=power6 the copy must go through memory. I compiled the testcase with -m64 -O3 -mcpu=power6 using trunk revision 149376. The pseudo in question is 361. Following are the 3 insns referencing reg 361 in the sched1 dump (before ira): (insn 51 238 241 8 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ]) (reg:DF 358 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil)) ... (insn 47 46 231 9 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ]) (reg:DF 179 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil)) ... (insn 196 194 198 11 thin6d_reduced.f:169 (set (mem/c/i:DF (plus:DI (reg/f:DI 477) (const_int 56 [0x38])) [2 crkve+0 S8 A64]) (reg:DF 361 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (expr_list:REG_DEAD (reg:DF 361 [ prephitmp.35 ]) (nil))) And from the ira dump: Pass1 cost computation: a71 (r361,l1) best GENERAL_REGS, cover GENERAL_REGS a3 (r361,l0) best GENERAL_REGS, cover GENERAL_REGS a3(r361,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 LINK_REGS:156,1836 CTR_REGS:156,1836 SPECIAL_REGS:156,1836 MEM:156 a71(r361,l1) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 LINK_REGS:1680,1680 CTR_REGS:1680,1680 SPECIAL_REGS:1680,1680 MEM:1120 Pass 2 cost computation: r361: preferred GENERAL_REGS, alternative NO_REGS a3(r361,l0) costs: BASE_REGS:0,2240 GENERAL_REGS:0,2240 FLOAT_REGS:312,2552 LINK_REGS:234,4154 CTR_REGS:234,4154 SPECIAL_REGS:234,4154 MEM:156 a71(r361,l1) costs: BASE_REGS:2240,2240 GENERAL_REGS:2240,2240 FLOAT_REGS:2240,2240 LINK_REGS:3920,3920 CTR_REGS:3920,3920 SPECIAL_REGS:3920,3920 MEM:3360 -- Summary: Poor register class choice in IRA Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64*-*-* GCC host triplet: powerpc64*-*-* GCC target triplet: powerpc64*-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40842
[Bug rtl-optimization/40842] Poor register class choice in IRA
--- Comment #1 from pthaugen at gcc dot gnu dot org 2009-07-23 21:33 --- Created an attachment (id=18248) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18248&action=view) Testcase The reduced testcase (since you can't attatch one when opening a new bz). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40842
[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817
--- Comment #23 from pthaugen at gcc dot gnu dot org 2009-07-23 23:48 --- I opened a new bugzilla, 40482, for the Load-hit-store RA issue discussed in comments 17-20 since that's a separate problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
[Bug rtl-optimization/33151] New: Invalid insn with pre_inc
I'm hitting the following on 6 SPEC CPU2006 benchmarks, some with pre_inc others with pre_modify. run/build_base_gcc411-base.0001> cat junk.c extern double cos (double x); extern double sin (double x); double pre_inc_ice (int n) { int i; double sc, ss, arg; sc = ss = 0.0; for (i = 0; i < n; i++) { arg = 1.0 / n; sc += cos (arg); ss += sin (arg); } return sc + ss; } run/build_base_gcc411-base.0001> /opt/biarch/gcc433-xlc-perf-2/bin/gcc -S -m32 -O2 junk.c junk.c: In function 'pre_inc_ice': junk.c:18: error: unrecognizable insn: (insn 84 80 85 4 junk.c:11 (set (reg/f:SI 155) (pre_inc:SI (reg/f:SI 139))) -1 (expr_list:REG_INC (reg/f:SI 139) (nil))) junk.c:18: internal compiler error: in extract_insn, at recog.c:1990 Please submit a full bug report, with preprocessed source if appropriate. -- Summary: Invalid insn with pre_inc Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33151
[Bug rtl-optimization/30113] [4.1 Regression] ICE in trunc_int_for_mode
--- Comment #6 from pthaugen at gcc dot gnu dot org 2007-09-21 20:17 --- Subject: Bug 30113 Author: pthaugen Date: Fri Sep 21 20:17:04 2007 New Revision: 128656 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128656 Log: 2007-09-21 Pat Haugen <[EMAIL PROTECTED]> Backport the following patch: 2006-12-11 Zdenek Dvorak <[EMAIL PROTECTED]> PR rtl-optimization/30113 * loop-iv.c (implies_p): Require the mode of the operands to be scalar. Modified: branches/ibm/gcc-4_1-branch/gcc/ChangeLog branches/ibm/gcc-4_1-branch/gcc/loop-iv.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30113
[Bug tree-optimization/33576] New: segfault in extract_muldiv for cpu2006 benchmark
The following trimmed testcase from 464.h264ref is segfaulting. int a1[6][4][4]; short b1[16]; int c1; void CalculateQuantParam(void) { int i, j, k, temp; for(k=0; k<6; k++) for(j=0; j<4; j++) for(i=0; i<4; i++) { temp = (i<<2)+j; a1[k][j][i] = c1/b1[temp]; } } $ /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O2 -ftree-loop-linear junk.c junk.c: In function #CalculateQuantParam#: junk.c:6: internal compiler error: Segmentation fault Please submit a full bug report, Program received signal SIGSEGV, Segmentation fault. extract_muldiv (t=0xf6efa1c0, c=0xf7de0f60, code=MULT_EXPR, wide_type=0xf6e60150, strict_overflow_p=0xffaaee08 "") at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6008 6008 > GET_MODE_SIZE (TYPE_MODE (type))) (gdb) bt 10 #0 extract_muldiv (t=0xf6efa1c0, c=0xf7de0f60, code=MULT_EXPR, wide_type=0xf6e60150, strict_overflow_p=0xffaaee08 "") at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6008 #1 0x42084484 in ?? () #2 0x10212c6c in extract_muldiv (t=, c=0xf7de0f60, code=MULT_EXPR, wide_type=0xf6e60460, strict_overflow_p=0xffaaeee0 "") at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6150 #3 0x22084442 in ?? () #4 0x10212f58 in extract_muldiv (t=0xf6f2adc0, c=0xf7de0480, code=MULT_EXPR, wide_type=0xf6e60150, strict_overflow_p=0xffaaeee0 "") at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6066 #5 0x42084442 in ?? () #6 0x101f1e34 in fold_binary (code=MULT_EXPR, type=0xf6e60150, op0=0xf6f2adc0, op1=0xf7de0480) at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:10359 #7 0x42008482 in ?? () #8 0x101fcb40 in fold_build2_stat (code=ERROR_MARK, type=0xf7de0f60, op0=0x43, op1=0xf6e60150) at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:13560 #9 0x106d5730 in chrec_fold_multiply (type=0xf6e60150, op0=0xf6f2adc0, op1=0xf7de0480) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-chrec.c:426 -- Summary: segfault in extract_muldiv for cpu2006 benchmark Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33576
[Bug tree-optimization/30565] ICE with -O1 -ftree-pre -ftree-loop-linear
--- Comment #7 from pthaugen at gcc dot gnu dot org 2007-10-01 15:15 --- Subject: Bug 30565 Author: pthaugen Date: Mon Oct 1 15:14:57 2007 New Revision: 128907 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128907 Log: 2007-10-01 Pat Haugen <[EMAIL PROTECTED]> Backport the following patch: 2007-05-03 Zdenek Dvorak <[EMAIL PROTECTED]> PR tree-optimization/30565 * lambda-code.c (perfect_nestify): Fix updating of dominators. Modified: branches/ibm/gcc-4_1-branch/gcc/ChangeLog branches/ibm/gcc-4_1-branch/gcc/lambda-code.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30565
[Bug tree-optimization/24309] [4.1/4.2/4.3 Regression] ICE with -O3 -ftree-loop-linear
--- Comment #18 from pthaugen at gcc dot gnu dot org 2007-10-10 22:28 --- I'm appear to be seeing the same thing trying to build 445.gobmk from cpu2006 on PowerPC. run/build_base_gcc_64.> cat junk.c int board_size; void print_eye(int eye[400]) { int m, n; int mini, maxi; int minj, maxj; mini = 1; minj = 1; for (m = 0; m < board_size; m++) for (n = 0; n < board_size; n++) { if (eye[m + n] != 5) continue; if (m < mini) mini = m; if (n < minj) minj = n; } board_size = mini + minj; } run/build_base_gcc_64.> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m64 -O2 -ftree-loop-linear junk.c junk.c: In function 'print_eye': junk.c:5: internal compiler error: in lambda_loopnest_to_gcc_loopnest, at lambda-code.c:1840 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC| |pthaugen at gcc dot gnu dot ||org, bergner at gcc dot gnu ||dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24309
[Bug tree-optimization/33742] New: Segfault in vectorizable_operation
Seeing a segfault trying to build 164.gzip from cpu2000. Noticed it with -O3, but also occurs for "-O2/-O1 -ftree-vectorize". run/0001> cat junk.c typedef unsigned short ush; extern ush prev[]; void fill_window() { register unsigned n, m; for (n = 0; n < 32768; n++) { m = prev[n]; prev[n] = (ush)(m >= 0x8000 ? m-0x8000 : 0); } } run/0001> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O3 junk.c junk.c: In function 'fill_window': junk.c:4: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. Program received signal SIGSEGV, Segmentation fault. vectorizable_operation (stmt=0xf7f69860, bsi=0x0, vec_stmt=0x0, slp_node=0x0) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-transform.c:3807 3807 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); (gdb) bt 5 #0 vectorizable_operation (stmt=0xf7f69860, bsi=0x0, vec_stmt=0x0, slp_node=0x0) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-transform.c:3807 #1 0x106efc68 in vect_analyze_operations (loop_vinfo=0x109d8d20) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-analyze.c:484 #2 0x106f7060 in vect_analyze_loop (loop=0xf7ec43f0) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-analyze.c:4341 #3 0x104d010c in vectorize_loops () at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vectorizer.c:2501 #4 0x1045e000 in tree_vectorize () at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-loop.c:216 (More stack frames follow...) -- Summary: Segfault in vectorizable_operation Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33742
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #11 from pthaugen at gcc dot gnu dot org 2007-10-17 16:14 --- Created an attachment (id=14364) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14364&action=view) Testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #12 from pthaugen at gcc dot gnu dot org 2007-10-17 16:18 --- And now some comments to go with the prior attatchment... This checkin is causing a 75% degradation on leslie3d for PowerPC. As HJ observed earlier, it depends on a second function accessing some of the global data. Generated code for simple copy loop in EXTRAPI(), compiled with -O2. revision 126325: .p2align 4,,15 .L3: lfd 0,0(11) #* ivtmp.71, tmp198 add 11,11,6 # ivtmp.71, ivtmp.71, ivtmp.77 stfd 0,0(9) #* ivtmp.76, tmp198 add 9,9,5# ivtmp.76, ivtmp.76, ivtmp.73 bdnz .L3 # revision 126326: .p2align 4,,15 .L6: lwz 10,12(27)# .stride, prephitmp.66 lwz 3,12(28) # .stride, prephitmp.56 .L3: mullw 10,10,12 # tmp141, prephitmp.66, i lwz 9,36(30) # .stride, .stride lwz 0,36(31) # .stride, .stride lwz 8,4(30) # uav.offset, uav.offset lwz 11,4(31) # qav.offset, qav.offset lwz 6,24(30) # .stride, .stride lwz 7,24(31) # .stride, .stride lwz 4,0(30) # uav.data, uav.data lwz 5,0(31) # qav.data, qav.data mullw 3,3,12 # tmp160, prephitmp.56, i slwi 9,9,1 # tmp142, .stride, slwi 0,0,1 # tmp161, .stride, add 9,9,8# tmp143, tmp142, uav.offset add 0,0,11 # tmp162, tmp161, qav.offset add 9,9,6# tmp145, tmp143, .stride add 0,0,7# tmp164, tmp162, .stride add 9,9,10 # tmp147, tmp145, tmp141 add 0,0,3# tmp166, tmp164, tmp160 slwi 9,9,3 # tmp148, tmp147, slwi 0,0,3 # tmp167, tmp166, addi 12,12,1 # i, i, lfdx 0,5,0 #* qav.data, tmp169 cmpw 7,12,29 # imax.3, tmp170, i stfdx 0,9,4 #* uav.data, tmp169 bne 7,.L6# -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC||pthaugen at gcc dot gnu dot ||org, bergner at gcc dot gnu ||dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance
Seeing a degradation in cpu2000 benchmark 252.eon that is caused by autovectorization of a simple loop in function ggSpectrum::Set(float). Here's a simple C version. void ggSpectrum_Set(float * data, float d) { int i; for (i = 0; i < 8; i++) data[i] = d; } When compiled with -O3 -mcpu=970 the following code is generated: ggSpectrum_Set: mfvrsave 0 stwu 1,-48(1) stw 0,44(1) oris 0,0,0x8000 mtvrsave 0 li 10,0 rlwinm 0,3,30,30,31 subfic 0,0,4 andi. 9,0,3 beq- 0,.L16 mtctr 9 .p2align 4,,15 .L10: slwi 0,10,2 addi 10,10,1 stfsx 1,3,0 subfic 8,10,8 bdnz .L10 .L3: subfic 6,9,8 srwi 0,6,2 slwi. 7,0,2 beq- 0,.L5 mtctr 0 stfs 1,16(1) cmpwi 7,0,0 li 0,16 slwi 9,9,2 li 11,0 add 9,3,9 lvewx 0,1,0 vspltw 0,0,0 beq- 7,.L17 .p2align 4,,15 .L6: slwi 0,11,4 addi 11,11,1 stvx 0,9,0 bdnz .L6 cmpw 7,6,7 subf 8,7,8 add 10,10,7 beq- 7,.L9 .L5: mtctr 8 slwi 0,10,2 add 3,3,0 .p2align 4,,15 .L8: stfs 1,0(3) addi 3,3,4 bdnz .L8 .L9: lwz 12,44(1) mtvrsave 12 addi 1,1,48 blr .L16: mr 10,9 li 8,8 b .L3 .L17: li 0,1 mtctr 0 b .L6 Adding -mno-altivec results in this simpler sequence, and a significant boost in performance (~40% speedup for the benchmark): ggSpectrum_Set: stfs 1,28(3) stfs 1,0(3) stfs 1,4(3) stfs 1,8(3) stfs 1,12(3) stfs 1,16(3) stfs 1,20(3) stfs 1,24(3) blr Another thing that stood out from the benchmark run was that the code was taking a pretty big hit on a couple of the statically predicted branches (apparently the address was already 16 byte aligned a lot of the time). So it seems like it would be best to remove the static prediction and let the hardware prediction take over. -- Summary: Autovectorization of constant iteration loop degrades performance Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194
[Bug tree-optimization/34976] New: verify_ssa ICE with -ftree-loop-linear
Discovered the following trying to build the cpu2006 benchmark 454.calculix. work/temp> cat umat_aniso_creep.f subroutine umat_aniso_creep() ! real*8 gr(6,6) ! ! do i=1,6 do j=1,6 gr(i,j)=0.d0 enddo if(i.le.3) then gr(i,i)=1.d0 else gr(i,i)=0.5d0 endif enddo call dgesv(gr) return end work/temp> ~/install/gcc/trunk/bin/gfortran -c -m32 -O2 -ftree-loop-linear umat_aniso_creep.f umat_aniso_creep.f: In function 'umat_aniso_creep': umat_aniso_creep.f:1: error: definition in block 14 does not dominate use in block 3 for SSA_NAME: i_15 in statement: i_27 = PHI PHI argument i_15 for PHI node i_27 = PHI umat_aniso_creep.f:1: internal compiler error: verify_ssa failed Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. -- Summary: verify_ssa ICE with -ftree-loop-linear Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34976
[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown
--- Comment #60 from pthaugen at gcc dot gnu dot org 2009-02-02 19:16 --- I tried leslie3d on PPC. The alias-improvements branch does indeed seem to fix the issue. The version of leslie3d built with the alias-improvements branch is about 10% faster than a version built with trunk, and is equivalent to a trunk version built with --param max-aliased-vops=1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug debug/41295] [4.5 Regression] gfortran.dg/loc_2.f90 -O3 -g fails on SH with orphaned debug_insn
--- Comment #5 from pthaugen at gcc dot gnu dot org 2009-09-09 20:23 --- Didn't mean to change the component when CC'ing myself, changing back. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added Component|rtl-optimization|debug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41295
[Bug fortran/41494] New: temp and memcpy used when zeroing array
The following is causing a 7% degradation on cpu2000 benchmark 191.fma3d. The problem was introduced somewhere between revisions 151544 and 151586. Compile the following with -m32 -O2. MODULE force_ TYPE :: force_type REAL(KIND(0D0)) Xext! X direction external force REAL(KIND(0D0)) Yext! Y direction external force REAL(KIND(0D0)) Zext! Z direction external force REAL(KIND(0D0)) Xint! X direction internal force REAL(KIND(0D0)) Yint! Y direction internal force REAL(KIND(0D0)) Zint! Z direction internal force END TYPE TYPE (force_type), DIMENSION(:), ALLOCATABLE :: FORCE INTEGER NUMRT END MODULE force_ SUBROUTINE SOLVE USE force_ DO i = 1,NUMRT FORCE(i) = force_type (0,0,0,0,0,0) ENDDO END The latter revision now zeroes out a temp and uses that as the source on a memcpy call. rev 151544: .L3: stfd 0,0(9) stfd 0,8(9) stfd 0,16(9) stfd 0,24(9) stfd 0,32(9) stfd 0,40(9) addi 9,9,48 bdnz .L3 rev 151586: .L3: mr 3,30 mr 4,28 stfd 31,8(1) stfd 31,16(1) li 5,48 addi 31,31,1 stfd 31,24(1) stfd 31,32(1) stfd 31,40(1) stfd 31,48(1) bl memcpy cmpw 7,31,29 addi 30,30,48 bne 7,.L3 -- Summary: temp and memcpy used when zeroing array Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41494
[Bug target/42431] [4.5 Regression] wrong code for 200.sixtrack with vectorization and -fdata-sections
--- Comment #7 from pthaugen at gcc dot gnu dot org 2010-02-12 23:16 --- This looks like an ira/reload bug. The ira dump is where the offending insns first show up. (insn 17 26 244 3 pr42431.f:27 (set (mem:V4SI (reg/f:DI 29 29 [255]) [4 S16 A128]) (reg:V4SI 108 31 [269])) 942 {*altivec_movv4si} (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 0 [0x0]) (const_int 0 [0x0]) (const_int 0 [0x0]) (const_int 0 [0x0]) ]) (nil))) (insn 244 17 245 3 pr42431.f:27 (set (reg:DI 9 9) (mem/u/c:DI (plus:DI (reg:DI 2 2) (const:DI (unspec:DI [ (symbol_ref/u:DI ("*.LC2") [flags 0x2]) ] 49))) [7 S8 A8])) 373 {*movdi_internal64} (nil)) (insn 245 244 20 3 pr42431.f:27 (set (reg:DI 9 9) (plus:DI (reg:DI 9 9) (reg:DI 9 9))) 83 {*adddi3_internal1} (nil)) (insn 20 245 246 3 pr42431.f:27 (set (mem:V4SI (reg:DI 9 9) [4 S16 A128]) (reg:V4SI 108 31 [269])) 942 {*altivec_movv4si} (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 0 [0x0]) (const_int 0 [0x0]) (const_int 0 [0x0]) (const_int 0 [0x0]) ]) (nil))) insns 244/245 are not present in the prior (sched1) dump. insn 245 should be adding the constant 16 to gpr9 instead of adding r9. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC| |pthaugen at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42431
[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817
--- Comment #25 from pthaugen at gcc dot gnu dot org 2009-11-30 21:29 --- I am still seeing the 2-block loop using revision 154838, both 32 and 64 bit, compile options -O3 -mcpu=power6 -funroll-loops. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
[Bug tree-optimization/42248] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns
--- Comment #3 from pthaugen at gcc dot gnu dot org 2010-01-13 22:11 --- Created an attachment (id=19585) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19585&action=view) Testcase Adding a shortened executable testcase that still fails. This looks like an aliasing issue. We're not creating a dependency between a store and load of a memory location which then allows the scheduler to reorder the insns. (insn 2 9 3 2 test.c:10 (set (mem/s/c:DI (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x+0 S8 A64]) (reg:DI 3 3)) 373 {*movdi_internal64} (expr_list:REG_DEAD (reg:DI 3 3) (nil))) ... (insn 11 8 12 2 test.c:10 (set (reg:DF 126 [ x$a ]) (mem/s/j/c:DF (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x.a+0 S8 A64])) 360 {*movdf_hardfloat64} (nil)) Some debug data along the dependency creation/alias checking path: Breakpoint 4, true_dependence (mem=0x4357068, mem_mode=DImode, x=0x43571d0, vari...@0x112aa168: 0x106737c4 ) at /home/pthaugen/src/gcc/trunk/gcc/gcc/alias.c:2367 2367 return rtx_refs_may_alias_p (x, mem, true); (gdb) pr x (mem/s/j/c:DF (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x.a+0 S8 A64]) (gdb) pr mem (mem/s/c:DI (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x+0 S8 A64]) We then go through rtx_refs_may_alias_p() -> refs_may_alias_p_1() -> decl_refs_may_alias_p() which contains the following code where we return false. /* If both references are based on different variables, they cannot alias. */ if (!operand_equal_p (base1, base2, 0)) return false; Breakpoint 7, operand_equal_p (arg0=0x4370110, arg1=0x4370660, flags=0) at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:3160 3160 if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK) (gdb) prt arg0 unit size align 64 symtab 0 alias set -1 canonical type 0x442ca20 fields DC file test.c line 2 col 19 size unit size align 64 offset_align 128 offset bit offset context chain > context pointer_to_this chain > used BLK file test.c line 9 col 14 size unit size align 64 context (mem/s/c:BLK (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x+0 S32 A64]) arg-type incoming-rtl (reg:DI 3 3 [ x+-8 ]) chain > (gdb) prt arg1 unit size align 64 symtab 0 alias set -1 canonical type 0x442ca20 fields DC file test.c line 2 col 19 size unit size align 64 offset_align 128 offset bit offset context chain > context pointer_to_this chain > used BLK file test.c line 9 col 14 size unit size align 64 context (mem/s/c:BLK (plus:DI (reg/f:DI 67 ap) (const_int 48 [0x30])) [0 x+0 S32 A64]) arg-type incoming-rtl (reg:DI 3 3 [ x+-8 ]) chain > In operand_equal_p() we get to the following which returns false: case tcc_declaration: /* Consider __builtin_sqrt equal to sqrt. */ return (TREE_CODE (arg0) == FUNCTION_DECL && DECL_BUILT_IN (arg0) && DECL_BUILT_IN (arg1) && DECL_BUILT_IN_CLASS (arg0) == DECL_BUILT_IN_CLASS (arg1) && DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1)); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248
[Bug tree-optimization/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns
--- Comment #11 from pthaugen at gcc dot gnu dot org 2010-01-14 15:45 --- I'll test on PPC. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC||pthaugen at gcc dot gnu dot | |org Component|middle-end |tree-optimization Target Milestone|4.5.0 |--- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248
[Bug middle-end/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns
--- Comment #12 from pthaugen at gcc dot gnu dot org 2010-01-14 17:29 --- The first patch appeared to work, resulting in correct ordering, but the second patch had some issues. It also corrected the original ordering, but introduced some new incorrect code for handling the args coming in in FPR regs. Check: .L.check: mflr 0 std 0,16(1) stdu 1,-128(1) std 3,176(1) std 4,184(1) std 5,192(1) std 6,200(1) lfd 13,184(1) lfd 0,176(1) >>std 8,112(1) >>lfd 12,112(1) >>fcmpu 7,0,12 crnot 30,30 beq 7,.L4 >>fcmpu 7,13,1 beq 7,.L1 The first compare should be comparing FP12 against FP1, not sure how the GPR8 stuff got in there, nothing is even passed in GPR8. Init: .L.init: std 5,0(3) stfd 1,8(3) blr Should just be storing FPR1/FPR2 into consecutive locations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248
[Bug middle-end/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns
--- Comment #15 from pthaugen at gcc dot gnu dot org 2010-01-17 16:54 --- Updated patch works for testcase and bootstrap/regtest on PPC passed. Thx. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248
[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails
--- Comment #2 from pthaugen at gcc dot gnu dot org 2010-07-14 15:11 --- Created an attachment (id=21200) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21200&action=view) dump file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932
[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails
--- Comment #3 from pthaugen at gcc dot gnu dot org 2010-07-14 15:12 --- Created an attachment (id=21201) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21201&action=view) dump file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932
[Bug testsuite/42855] FAIL: gcc.dg/tree-ssa/pr42585.c scan-tree-dump-times optimized *
--- Comment #4 from pthaugen at gcc dot gnu dot org 2010-07-14 18:18 --- Based on the last post in the patch thread should the patch be committed so the testsuite failures go away and this can be closed? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42855
[Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
--- Comment #33 from pthaugen at gcc dot gnu dot org 2010-07-16 19:14 --- gcc.dg/tree-ssa/loop-19.c started failing on powerpc with -m64 between 7/5 and 7/7. The tree dump now looks like the following: : ivtmp.10_12 = (long unsigned int) &a[-1]; ivtmp.16_15 = (long unsigned int) &c[-1]; a.21_18 = (long unsigned int) &a; D.2035_19 = a.21_18 + 1592; : # ivtmp.10_9 = PHI # ivtmp.16_13 = PHI ivtmp.10_5 = ivtmp.10_9 + 8; D.2032_16 = (void *) ivtmp.10_5; D.2007_3 = MEM[(double[200] *)D.2032_16]; ivtmp.16_14 = ivtmp.16_13 + 8; D.2033_17 = (void *) ivtmp.16_14; MEM[(double[200] *)D.2033_17] = D.2007_3; if (ivtmp.10_5 != D.2035_19) goto ; else goto ; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
--- Comment #39 from pthaugen at gcc dot gnu dot org 2010-07-21 21:51 --- (In reply to comment #38) > > .L2: > addi 11,8,9216 > ldx 0,10,9 > stdx 0,11,9 > addi 9,9,8 > bdnz .L2 > > and in r161844: > > .L2: > ldu 0,8(11) > stdu 0,8(9) > bdnz .L2 > > I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like > a > win to me. Bit-rotten test case? > The 'addi 11,8,9216' in the first loop is invariant and should be hoisted out of the loop. Separate issue? As for the issue of indexed ld/st+addi vs. update-form ld/st. The update forms are cracked into ld/st+addi which imposes a scheduling restriction on them (cracked insns start a dispatch group). May not make any difference in this simple loop, but indexed ld/st+addi may have better scheduling opportunities were there more insns in the loop. This testcase also appears to be dependent on -mcpu value. Specifying -mcpu=power7 the testcase passes (although there's still the issue of invariant addi in the loop). And if I change to use -m32, then it only fails for -mcpu=power6. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/45663] [4.6 regression] New test failures
--- Comment #3 from pthaugen at gcc dot gnu dot org 2010-09-14 20:04 --- Not sure I understand everything involved here, but isn't the test a little suspect any time higher optimization levels and instruction scheduling are enabled? -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.6.0 |--- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45663
[Bug tree-optimization/45671] New: Reassociate expressions for greater parallelism
The following testcase demonstrates where reassociation/regrouping of expressions could result in greater parallelism for processors that have multiple arithmetic execution units. int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Compiling with -O3 results in a series of dependent add instructions to accumulate the sum. add 4,3,4 add 4,4,5 add 4,4,6 add 4,4,7 add 4,4,8 add 4,4,9 add 4,4,10 If we regrouped to (a+b)+(c+d)+... we can do multiple adds in parallel on different execution units. -- Summary: Reassociate expressions for greater parallelism Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45671
[Bug fortran/45648] [4.6 regression] Unnecessary temporary for transpose calls as actual argument.
--- Comment #3 from pthaugen at gcc dot gnu dot org 2010-09-20 20:00 --- As Steven mentioned in the mailing list, this did introduce a degradation for cpu2000 benchmark galgel. I'm seeing about -10% on PowerPC. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC||pthaugen at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45648
[Bug fortran/45648] [4.6 regression] Unnecessary temporary for transpose calls as actual argument.
--- Comment #4 from pthaugen at gcc dot gnu dot org 2010-09-20 20:45 --- (In reply to comment #2) > Created an attachment (id=21777) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21777&action=view) [edit] > patch restoring the previous behaviour. > Applying this patch restored the performance. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45648
[Bug target/44199] ppc64 glibc miscompilation
--- Comment #9 from pthaugen at gcc dot gnu dot org 2010-05-20 16:23 --- Spec testing went fine, differences in the noise range. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199
[Bug target/44199] ppc64 glibc miscompilation
--- Comment #11 from pthaugen at gcc dot gnu dot org 2010-05-20 17:59 --- No I didn't bootstrap/regtest. I can take care of trunk if you want to do 4.4. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199
[Bug target/44199] ppc64 glibc miscompilation
--- Comment #13 from pthaugen at gcc dot gnu dot org 2010-05-21 02:32 --- Bootstrap of trunk went fine. Regression test (contrib/test_summary) showed the following difference between base/patched versions, but didn't have any testcases show up in the diff, so not sure what to make of that. This is for 32-bit gcc testsuite. < # of expected passes 59078 --- > # of expected passes 59075 98c98 < # of unsupported tests872 --- > # of unsupported tests875 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199
[Bug target/44199] ppc64 glibc miscompilation
--- Comment #22 from pthaugen at gcc dot gnu dot org 2010-05-26 15:51 --- The 4.4 patch isn't complete. /home/gccbuild/gcc_4.4_anonsvn/gcc/gcc/config/rs6000/rs6000.c:17166: undefined reference to `offset_below_red_zone_p' /home/gccbuild/gcc_4.4_anonsvn/gcc/gcc/config/rs6000/rs6000.c:17188: undefined reference to `offset_below_red_zone_p' -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199
[Bug testsuite/44701] [4.6 regression] PR44492 fix broke gcc.target/powerpc/asm-es-2.c
--- Comment #4 from pthaugen at gcc dot gnu dot org 2010-07-12 21:03 --- Adding '<>' to the "=m" constraint fixes the testcase, but adding a single '>' (or '<') results in an error for impossible constraints. This is caused by the following snippet that was added to reload1.c: switch (GET_CODE (XEXP (recog_data.operand[opno], 0))) { case PRE_INC: case POST_INC: case PRE_DEC: case POST_DEC: case PRE_MODIFY: case POST_MODIFY: if (strchr (recog_data.constraints[opno], '<') == NULL || strchr (recog_data.constraints[opno], '>') == NULL) return 0; break; Should the code be using '&&' instead of '||'? -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC| |jakub at gcc dot gnu dot | |org, pthaugen at gcc dot gnu ||dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44701
[Bug testsuite/44701] [4.6 regression] PR44492 fix broke gcc.target/powerpc/asm-es-2.c
--- Comment #5 from pthaugen at gcc dot gnu dot org 2010-07-12 21:05 --- Sorry, recog.c is where the prior code snippet came from. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44701
[Bug testsuite/44932] New: gcc.dg/uninit-pred-9_b.c fails
Subject testcase fails on powerpc64. FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 24) Compiling standalone I see the following: pthaugen/work> ~/install/gcc/trunk/bin/gcc -O2 -S -m32 -Wuninitialized ~/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In function 'foo': /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:24:11: warning: 'v' may be used uninitialized in this function [-Wuninitialized] /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In function 'foo_2': /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:41:11: warning: 'v' may be used uninitialized in this function [-Wuninitialized] -- Summary: gcc.dg/uninit-pred-9_b.c fails Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #27 from pthaugen at gcc dot gnu dot org 2007-10-22 21:11 --- I tried a recent mainline on PowerPC for leslie3d, revision 129550 improved over revision 129454 by 67%. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #29 from pthaugen at gcc dot gnu dot org 2007-10-23 20:29 --- Found another example on PowerPC in the same benchmark that is not fixed by the checked in patches. Compiled with -m32 -O2. From the loop in procedure FLUXI: revision 126325: .L47: lfd 0,0(9) #* ivtmp.277, tmp346 lfd 13,-8(9) #, tmp347 add 9,9,0# ivtmp.277, ivtmp.277, ivtmp.309 fsub 0,0,13 # tmp345, tmp346, tmp347 fmul 0,0,12 # tmp348, tmp345, dtvol.23 fneg 0,0 # tmp349, tmp348 stfd 0,0(11) #* ivtmp.306, tmp349 add 11,11,10 # ivtmp.306, ivtmp.306, ivtmp.280 bdnz .L47# revision 126326 (and current mainline): .L83: lwz 0,48(7) # .stride, .stride lfd 0,0(10) #* ivtmp.277, tmp416 lfd 13,-8(10)#, tmp417 lfd 12,0(5) # dtvol, dtvol add 10,10,6 # ivtmp.277, ivtmp.277, ivtmp.306 mullw 0,8,0 # tmp409, l, .stride fsub 0,0,13 # tmp415, tmp416, tmp417 lwz 9,4(7) # du.offset, du.offset lwz 11,0(7) # du.data, du.data fmul 0,0,12 # tmp420, tmp415, dtvol fneg 0,0 # tmp422, tmp420 add 0,0,9# tmp412, tmp409, du.offset addi 8,8,1 # l, l, slwi 0,0,3 # tmp413, tmp412, stfdx 0,11,0 #, tmp422 bdnz .L83# -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #30 from pthaugen at gcc dot gnu dot org 2007-10-23 20:30 --- Created an attachment (id=14402) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14402&action=view) Testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #35 from pthaugen at gcc dot gnu dot org 2007-10-24 23:15 --- I reran leslie3d on PowerPC with --param max-aliased-vops=1. The result was a 90% improvement over my prior revision 129550 run (218% improvement over the original mainline run). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug middle-end/32429] [PPC, missing optimization] stack space not optimized when stack not used
--- Comment #1 from pthaugen at gcc dot gnu dot org 2007-10-31 19:05 --- Looks like the -fpic/PIC/pie/PIE issue was fixed on mainline with the following patch. 2007-09-04 Daniel Jacobowitz <[EMAIL PROTECTED]> * config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the GOT pointer only if there is a constant pool. Use the allocated space for SPE also. See bug 28966 for the -maltivec fix. -- pthaugen at gcc dot gnu dot org changed: What|Removed |Added CC| |pthaugen at gcc dot gnu dot | |org, drow at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32429
[Bug target/28966] -maltivec -m32 causes the stack to be saved and restored even though there is no need for it
--- Comment #4 from pthaugen at gcc dot gnu dot org 2007-10-31 19:48 --- Subject: Bug 28966 Author: pthaugen Date: Wed Oct 31 19:48:19 2007 New Revision: 129806 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129806 Log: 2007-10-01 Pat Haugen <[EMAIL PROTECTED]> Backport the following patches: 2007-09-04 Daniel Jacobowitz <[EMAIL PROTECTED]> * config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the GOT pointer only if there is a constant pool. Use the allocated space for SPE also. 2006-12-21 Nathan Sidwell <[EMAIL PROTECTED]> PR target/28966 PR target/29248 * reload1.c (reload): Realign stack after it changes size. Modified: branches/ibm/gcc-4_1-branch/gcc/ChangeLog branches/ibm/gcc-4_1-branch/gcc/config/rs6000/rs6000.c branches/ibm/gcc-4_1-branch/gcc/reload1.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28966
[Bug target/29248] Stack pointer is modified in functions that don't use the stack
--- Comment #7 from pthaugen at gcc dot gnu dot org 2007-10-31 19:48 --- Subject: Bug 29248 Author: pthaugen Date: Wed Oct 31 19:48:19 2007 New Revision: 129806 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129806 Log: 2007-10-01 Pat Haugen <[EMAIL PROTECTED]> Backport the following patches: 2007-09-04 Daniel Jacobowitz <[EMAIL PROTECTED]> * config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the GOT pointer only if there is a constant pool. Use the allocated space for SPE also. 2006-12-21 Nathan Sidwell <[EMAIL PROTECTED]> PR target/28966 PR target/29248 * reload1.c (reload): Realign stack after it changes size. Modified: branches/ibm/gcc-4_1-branch/gcc/ChangeLog branches/ibm/gcc-4_1-branch/gcc/config/rs6000/rs6000.c branches/ibm/gcc-4_1-branch/gcc/reload1.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29248
[Bug target/34087] New: ICE in regrename.c for movdf_hardfloat64_mfpgpr
The following attatchment from the cpu20006 benchmark leslie3d ICEs in the following manner. The problem appears to be a latent bug that has come and gone a couple times (I saw it earlier on the gamess benchmark too). Recent mainlines compile it fine, but revision 128829 exhibits the problem. Opening a bugzilla since I haven't been able to spend as much time digging into it as I'd thought/hoped, and don't want to lose track of it. > gfortran -c -m64 -O2 -mcpu=power6x -fpeel-loops -ffast-math junk.f junk.f: In function 'setiv': junk.f:913: error: insn does not satisfy its constraints: (insn 5218 2288 4792 191 (set (reg:DF 46 14 [2631]) (reg:DF 65 lr [1528])) 335 {*movdf_hardfloat64_mfpgpr} (expr_list:REG_DEAD (reg:DF 65 lr [1528]) (nil))) junk.f:913: internal compiler error: in build_def_use, at regrename.c:766 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. -- Summary: ICE in regrename.c for movdf_hardfloat64_mfpgpr Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34087
[Bug target/34087] ICE in regrename.c for movdf_hardfloat64_mfpgpr
--- Comment #1 from pthaugen at gcc dot gnu dot org 2007-11-13 21:08 --- Created an attachment (id=14549) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14549&action=view) Testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34087
[Bug tree-optimization/34976] verify_ssa ICE with -ftree-loop-linear
--- Comment #4 from pthaugen at gcc dot gnu dot org 2008-04-18 14:47 --- Is the workaround still appropriate, and if so, can it go in 4.3 and trunk now? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34976
[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown
--- Comment #43 from pthaugen at gcc dot gnu dot org 2008-04-30 18:49 --- Created an attachment (id=15553) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15553&action=view) Testcase I tried a mainline with the latest patch. While we no longer have problems with the prior testcases, there is no improvement for leslie3d on ppc64. I can still double the performance of the benchmark by specifying --param max-aliased-vops=1. Including a new trimmed down testcase from the benchmark where I'm still seeing poor code when max-aliased-vops is not increased, compiled with 'gfortran -m32 -O2'. Refer to the first nested loop in procedure FLUXK(): DO I = I1, I2 QS(I) = WAV(I,J,K) * ZAREA END DO Base: .L150: lwz 0,24(18) # .stride, .stride lwz 9,36(18) # .stride, .stride lwz 11,12(18)# .stride, .stride lwz 10,4(18) # wav.offset, wav.offset mullw 0,17,0 # tmp660, ivtmp.602, .stride lwz 8,0(18) # wav.data, wav.data mullw 9,30,9 # tmp666, ivtmp.590, .stride mullw 11,6,11# tmp670, i, .stride add 0,0,9# tmp672, tmp660, tmp666 addi 6,6,1 # i, i, add 0,0,11 # tmp673, tmp672, tmp670 add 0,0,10 # tmp674, tmp673, wav.offset slwi 0,0,3 # tmp676, tmp674, lfdx 0,8,0 #, tmp678 fmul 0,0,8 # tmp679, tmp678, zarea.64 stfdx 0,15,7 #* ivtmp.589, tmp679 addi 7,7,8 # ivtmp.589, ivtmp.589, bdnz .L150 # With --param max-aliased-vops=1000: .L150: lfd 0,0(11) #* ivtmp.599, tmp739 add 11,11,30 # ivtmp.599, ivtmp.599, D.2783 fmul 0,0,8 # tmp740, tmp739, zarea.64 stfdx 0,22,9 #* ivtmp.602, tmp740 addi 9,9,8 # ivtmp.602, ivtmp.602, bdnz .L150 # -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown
--- Comment #46 from pthaugen at gcc dot gnu dot org 2008-04-30 22:35 --- Just following up with comments posted on irc. The patch does fix the problem I was seeing. Spec ratio improved from 5.18 to 9.07 with the patch (75%), not quite the full 100% improvement I was seeing with --param max-aliased-vops=1, but getting awfully close. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown
--- Comment #47 from pthaugen at gcc dot gnu dot org 2008-05-01 19:26 --- Created an attachment (id=15557) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15557&action=view) Testcase Found some more cases, which are contributing to the missing 25%. Looks like similar code, but obviously something different about it since it's not getting caught with the latest patch. Compiled with -O2 -m64. >From first nested loop in FLUXI(): DO I = I1,I2 QS(I) = UAV(I,J,K) * XAREA END DO Base: .L185: ld 10,168(1) # pretmp.1065, ld 4,424(1) # D.1331, sldi 11,9,3 # tmp667, i, mulld 0,10,9 # tmp669,, i add 0,0,25 # tmp670, tmp669, ivtmp.1093 addi 9,9,1 # tmp675, i, sldi 0,0,3 # tmp671, tmp670, lfdx 0,4,0 #, tmp673 extsw 9,9# i.1073, tmp675 fmul 0,0,28 # tmp674, tmp673, xarea.63 stfdx 0,26,11#, tmp674 bdnz .L185 # With --param max-aliased-vops=1: .L185: lfd 0,0(11) #* ivtmp.886, tmp667 add 11,11,18 # ivtmp.886, ivtmp.886, D.3744 fmul 0,0,27 # tmp668, tmp667, xarea.63 stfdx 0,31,9 #* ivtmp.889, tmp668 addi 9,9,8 # ivtmp.889, ivtmp.889, cmpd 7,9,29 # tmp787, tmp673, ivtmp.889 bne 7,.L185 # -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug tree-optimization/36922] New: ICE in tree-data-ref.c with -ftree-loop-linear
Current trunk is ICE'ing when trying to build CPU2006 benchmark 416.gamess with -ftree-loop-linear. run/build_base_gcc_64.0001> cat junk.f SUBROUTINE GETCOF2(NP,FLM,ZLL,COEFF) C IMPLICIT DOUBLE PRECISION(A-H,O-Z) PARAMETER (MAXCOF=23821) DIMENSION COEFF(MAXCOF), * ZLL(0:2*NP+1),FLM(0:2*NP) C LENG=0 DO L=0,NP DO M=0,L DO LK=M,L LENG=LENG+1 COEFF(LENG)=FLM(L+M)*FLM(L-M)*ZLL(L-LK)/(FLM(LK+M)* * FLM(LK-M)*FLM(L-LK)*FLM(L-LK)) ENDDO ENDDO ENDDO C RETURN END run/build_base_gcc_64.0001> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -O2 -ftree-loop-linear junk.f junk.f: In function 'getcof2': junk.f:1: internal compiler error: in initialize_matrix_A, at tree-data-ref.c:1885 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. -- Summary: ICE in tree-data-ref.c with -ftree-loop-linear Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36922