[Bug tree-optimization/38258] New: IV-opts creating poor code for loop exit test

2008-11-24 Thread pthaugen at gcc dot gnu dot org
Noticed the following in 200.sixtrack benchmark from CPU2000.

  SUBROUTINE  DFACT(N,A,IDIN,IR,IFAIL,DET,JFAIL)
  IMPLICIT REAL*8 (A-H,O-Z)
  INTEGER IR(9),IPAIRF
  REAL*8A(IDIN,9),DET,  ZERO, ONE,X,Y,TF
  DO 144J  =  1, N
 K = IR(5)
 123 DO 124L  =  1, N
TF  =  A(J,L)
A(J,L)  =  A(K,L)
A(K,L)  =  TF
 124CONTINUE
 144 CONTINUE
  RETURN
  END

Compiling with -O2 -m64 results in poor code for the outer loop exit test (-m32
is fine, as are 4.3 and earlier versions).

4.3:
...
bdnz .L4 #inner loop exit
cmpw 7,7,3
addi 11,11,1
beqlr 7


mainline:
...
bdnz .L4   #inner loop exit
addi 11,8,-1
addi 3,3,1
rldicl 11,11,0,32
addi 11,11,2
add 11,12,11
add 11,11,0
cmpd 7,3,11
bne 7,.L5


-- 
   Summary: IV-opts creating poor code for loop exit test
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38258



[Bug tree-optimization/38258] [4.4 regression] IV-opts creating poor code for loop exit test

2008-11-24 Thread pthaugen at gcc dot gnu dot org


--- Comment #2 from pthaugen at gcc dot gnu dot org  2008-11-24 22:28 
---
Working fine after updating my source. Must have been fixed within the past
week.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38258



[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2009-05-18 Thread pthaugen at gcc dot gnu dot org


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu dot
   ||org
   Priority|P2  |P3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976



[Bug middle-end/40281] -fprefetch-loop-arrays: ICE: in initialize_matrix_A, at tree-data-ref.c:1887

2009-06-19 Thread pthaugen at gcc dot gnu dot org


--- Comment #2 from pthaugen at gcc dot gnu dot org  2009-06-19 14:41 
---
Created an attachment (id=18024)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18024&action=view)
testcase

The attatched testcase is from cpu2006 component 464.h264ref and appears to
fail in the same manner on PowerPC.

work/spec_err> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O2
-fprefetch-loop-arrays q_matrix.c
q_matrix.c: In function #ParseMatrix#:
q_matrix.c:5:6: internal compiler error: in initialize_matrix_A, at
tree-data-ref.c:1896
Please submit a full bug report,
with preprocessed source if appropriate.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40281



[Bug tree-optimization/40496] New: ICE in verify_stmts with -fprefetch-loop-arrays

2009-06-19 Thread pthaugen at gcc dot gnu dot org
Current trunk ICE's as follows when -fprefetch-loop-arrays is specified. The
testcase that will be attatched is trimmed down from cpu2006 benchmark
483.xalancbmk.

work/spec_err> /home/pthaugen/install/gcc/trunk/bin/g++ -c -m64 -O2
-fprefetch-loop-arrays DOMString.cpp
DOMString.cpp: In static member function #static void*
DOMStringHandle::operator new(size_t)#:
DOMString.cpp:26:9: error: Incompatible types in PHI argument 0
struct DOMStringHandle *

void *

freeListPtr.3_44 = PHI 

DOMString.cpp:26:9: internal compiler error: verify_stmts failed
Please submit a full bug report,
with preprocessed source if appropriate.


-- 
   Summary: ICE in verify_stmts with -fprefetch-loop-arrays
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40496



[Bug tree-optimization/40496] ICE in verify_stmts with -fprefetch-loop-arrays

2009-06-19 Thread pthaugen at gcc dot gnu dot org


--- Comment #1 from pthaugen at gcc dot gnu dot org  2009-06-19 21:32 
---
Created an attachment (id=18026)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18026&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40496



[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2009-07-08 Thread pthaugen at gcc dot gnu dot org


--- Comment #20 from pthaugen at gcc dot gnu dot org  2009-07-08 21:53 
---
Created an attachment (id=18165)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18165&action=view)
Reduced testcase

The attatched testcase exhibits the problem with the load-hit-store. It's
resulting from choosing a bad register class (GENERAL_REGS) for a pseudo that
should get assigned to FLOAT_REGS. Since there is no FPR -> GPR move for
-mcpu=power6 the copy must go through memory.  I compiled the testcase with
-m64 -O3 -mcpu=power6 using trunk revision 149376.  The pseudo in question is
361.

Following are the 3 insns referencing reg 361 in the sched1 dump (before ira):

(insn 51 238 241 8 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ])
(reg:DF 358 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil))
...
(insn 47 46 231 9 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ])
(reg:DF 179 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil))
...
(insn 196 194 198 11 thin6d_reduced.f:169 (set (mem/c/i:DF (plus:DI (reg/f:DI
477)
(const_int 56 [0x38])) [2 crkve+0 S8 A64])
(reg:DF 361 [ prephitmp.35 ])) 351 {*movdf_hardfloat64}
(expr_list:REG_DEAD (reg:DF 361 [ prephitmp.35 ])
(nil)))


And from the ira dump:

Pass1 cost computation:
a71 (r361,l1) best GENERAL_REGS, cover GENERAL_REGS
a3 (r361,l0) best GENERAL_REGS, cover GENERAL_REGS
  a3(r361,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0
LINK_REGS:156,1836 CTR_REGS:156,1836 SPECIAL_REGS:156,1836 MEM:156
  a71(r361,l1) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0
LINK_REGS:1680,1680 CTR_REGS:1680,1680 SPECIAL_REGS:1680,1680 MEM:1120


Pass 2 cost computation:
r361: preferred GENERAL_REGS, alternative NO_REGS
  a3(r361,l0) costs: BASE_REGS:0,2240 GENERAL_REGS:0,2240 FLOAT_REGS:312,2552
LINK_REGS:234,4154 CTR_REGS:234,4154 SPECIAL_REGS:234,4154 MEM:156
  a71(r361,l1) costs: BASE_REGS:2240,2240 GENERAL_REGS:2240,2240
FLOAT_REGS:2240,2240 LINK_REGS:3920,3920 CTR_REGS:3920,3920
SPECIAL_REGS:3920,3920 MEM:3360

Not sure what's causing the FLOAT cost to be higher than the GENERAL cost yet.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976



[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2009-07-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #22 from pthaugen at gcc dot gnu dot org  2009-07-14 21:15 
---
The original problem, multi-block loop preventing movement of loads, was
reintroduced with revision 149206, Jan's CD-DCE patch to remove empty loops.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||jh at suse dot cz


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976



[Bug rtl-optimization/40842] New: Poor register class choice in IRA

2009-07-23 Thread pthaugen at gcc dot gnu dot org
Moving this issue from bz 39976 since it is a separate problem than the
original documented there.

Verified the behavior still exists using current trunk revision (150020).  The
testcase comes from cpu2000 sixtrack benchmark. Following is original comment I
posted:

===
The attatched testcase exhibits the problem with the load-hit-store. It's
resulting from choosing a bad register class (GENERAL_REGS) for a pseudo that
should get assigned to FLOAT_REGS. Since there is no FPR -> GPR move for
-mcpu=power6 the copy must go through memory.  I compiled the testcase with
-m64 -O3 -mcpu=power6 using trunk revision 149376.  The pseudo in question is
361.

Following are the 3 insns referencing reg 361 in the sched1 dump (before ira):

(insn 51 238 241 8 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ])
(reg:DF 358 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil))
...
(insn 47 46 231 9 thin6d_reduced.f:178 (set (reg:DF 361 [ prephitmp.35 ])
(reg:DF 179 [ prephitmp.35 ])) 351 {*movdf_hardfloat64} (nil))
...
(insn 196 194 198 11 thin6d_reduced.f:169 (set (mem/c/i:DF (plus:DI (reg/f:DI
477)
(const_int 56 [0x38])) [2 crkve+0 S8 A64])
(reg:DF 361 [ prephitmp.35 ])) 351 {*movdf_hardfloat64}
(expr_list:REG_DEAD (reg:DF 361 [ prephitmp.35 ])
(nil)))


And from the ira dump:

Pass1 cost computation:
a71 (r361,l1) best GENERAL_REGS, cover GENERAL_REGS
a3 (r361,l0) best GENERAL_REGS, cover GENERAL_REGS
  a3(r361,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0
LINK_REGS:156,1836 CTR_REGS:156,1836 SPECIAL_REGS:156,1836 MEM:156
  a71(r361,l1) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0
LINK_REGS:1680,1680 CTR_REGS:1680,1680 SPECIAL_REGS:1680,1680 MEM:1120


Pass 2 cost computation:
r361: preferred GENERAL_REGS, alternative NO_REGS
  a3(r361,l0) costs: BASE_REGS:0,2240 GENERAL_REGS:0,2240 FLOAT_REGS:312,2552
LINK_REGS:234,4154 CTR_REGS:234,4154 SPECIAL_REGS:234,4154 MEM:156
  a71(r361,l1) costs: BASE_REGS:2240,2240 GENERAL_REGS:2240,2240
FLOAT_REGS:2240,2240 LINK_REGS:3920,3920 CTR_REGS:3920,3920
SPECIAL_REGS:3920,3920 MEM:3360


-- 
   Summary: Poor register class choice in IRA
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64*-*-*
  GCC host triplet: powerpc64*-*-*
GCC target triplet: powerpc64*-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40842



[Bug rtl-optimization/40842] Poor register class choice in IRA

2009-07-23 Thread pthaugen at gcc dot gnu dot org


--- Comment #1 from pthaugen at gcc dot gnu dot org  2009-07-23 21:33 
---
Created an attachment (id=18248)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18248&action=view)
Testcase

The reduced testcase (since you can't attatch one when opening a new bz).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40842



[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2009-07-23 Thread pthaugen at gcc dot gnu dot org


--- Comment #23 from pthaugen at gcc dot gnu dot org  2009-07-23 23:48 
---
I opened a new bugzilla, 40482, for the Load-hit-store RA issue discussed in
comments 17-20 since that's a separate problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976



[Bug rtl-optimization/33151] New: Invalid insn with pre_inc

2007-08-22 Thread pthaugen at gcc dot gnu dot org
I'm hitting the following on 6 SPEC CPU2006 benchmarks, some with pre_inc
others with pre_modify.

run/build_base_gcc411-base.0001> cat junk.c
extern double cos (double x);
extern double sin (double x);

double
pre_inc_ice (int n)
{
  int i;
  double sc, ss, arg;

  sc = ss = 0.0;
  for (i = 0; i < n; i++)
{
  arg = 1.0 / n;
  sc += cos (arg);
  ss += sin (arg);
}
  return sc + ss;
}
run/build_base_gcc411-base.0001> /opt/biarch/gcc433-xlc-perf-2/bin/gcc -S -m32
-O2 junk.c
junk.c: In function 'pre_inc_ice':
junk.c:18: error: unrecognizable insn:
(insn 84 80 85 4 junk.c:11 (set (reg/f:SI 155)
(pre_inc:SI (reg/f:SI 139))) -1 (expr_list:REG_INC (reg/f:SI 139)
(nil)))
junk.c:18: internal compiler error: in extract_insn, at recog.c:1990
Please submit a full bug report,
with preprocessed source if appropriate.


-- 
   Summary: Invalid insn with pre_inc
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33151



[Bug rtl-optimization/30113] [4.1 Regression] ICE in trunc_int_for_mode

2007-09-21 Thread pthaugen at gcc dot gnu dot org


--- Comment #6 from pthaugen at gcc dot gnu dot org  2007-09-21 20:17 
---
Subject: Bug 30113

Author: pthaugen
Date: Fri Sep 21 20:17:04 2007
New Revision: 128656

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128656
Log:
2007-09-21  Pat Haugen  <[EMAIL PROTECTED]>

Backport the following patch:

2006-12-11  Zdenek Dvorak <[EMAIL PROTECTED]>

PR rtl-optimization/30113
* loop-iv.c (implies_p): Require the mode of the operands to be
 scalar.


Modified:
branches/ibm/gcc-4_1-branch/gcc/ChangeLog
branches/ibm/gcc-4_1-branch/gcc/loop-iv.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30113



[Bug tree-optimization/33576] New: segfault in extract_muldiv for cpu2006 benchmark

2007-09-27 Thread pthaugen at gcc dot gnu dot org
The following trimmed testcase from 464.h264ref is segfaulting.

int a1[6][4][4];
short b1[16];

int c1;
void CalculateQuantParam(void)
{
  int i, j, k, temp;

   for(k=0; k<6; k++)
  for(j=0; j<4; j++)
for(i=0; i<4; i++)
{
  temp = (i<<2)+j;
  a1[k][j][i]  = c1/b1[temp];
}
}


$ /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O2 -ftree-loop-linear
junk.c
junk.c: In function #CalculateQuantParam#:
junk.c:6: internal compiler error: Segmentation fault
Please submit a full bug report,


Program received signal SIGSEGV, Segmentation fault.
extract_muldiv (t=0xf6efa1c0, c=0xf7de0f60, code=MULT_EXPR,
wide_type=0xf6e60150, 
strict_overflow_p=0xffaaee08 "")
at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6008
6008   > GET_MODE_SIZE (TYPE_MODE (type)))
(gdb) bt 10
#0  extract_muldiv (t=0xf6efa1c0, c=0xf7de0f60, code=MULT_EXPR,
wide_type=0xf6e60150, 
strict_overflow_p=0xffaaee08 "")
at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6008
#1  0x42084484 in ?? ()
#2  0x10212c6c in extract_muldiv (t=, c=0xf7de0f60, 
code=MULT_EXPR, wide_type=0xf6e60460, strict_overflow_p=0xffaaeee0 "")
at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6150
#3  0x22084442 in ?? ()
#4  0x10212f58 in extract_muldiv (t=0xf6f2adc0, c=0xf7de0480, code=MULT_EXPR, 
wide_type=0xf6e60150, strict_overflow_p=0xffaaeee0 "")
at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:6066
#5  0x42084442 in ?? ()
#6  0x101f1e34 in fold_binary (code=MULT_EXPR, type=0xf6e60150, op0=0xf6f2adc0, 
op1=0xf7de0480) at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:10359
#7  0x42008482 in ?? ()
#8  0x101fcb40 in fold_build2_stat (code=ERROR_MARK, type=0xf7de0f60, op0=0x43, 
op1=0xf6e60150) at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:13560
#9  0x106d5730 in chrec_fold_multiply (type=0xf6e60150, op0=0xf6f2adc0,
op1=0xf7de0480)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-chrec.c:426


-- 
   Summary: segfault in extract_muldiv for cpu2006 benchmark
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33576



[Bug tree-optimization/30565] ICE with -O1 -ftree-pre -ftree-loop-linear

2007-10-01 Thread pthaugen at gcc dot gnu dot org


--- Comment #7 from pthaugen at gcc dot gnu dot org  2007-10-01 15:15 
---
Subject: Bug 30565

Author: pthaugen
Date: Mon Oct  1 15:14:57 2007
New Revision: 128907

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128907
Log:
2007-10-01  Pat Haugen  <[EMAIL PROTECTED]>

Backport the following patch:

2007-05-03  Zdenek Dvorak  <[EMAIL PROTECTED]>

PR tree-optimization/30565
* lambda-code.c (perfect_nestify): Fix updating of dominators.


Modified:
branches/ibm/gcc-4_1-branch/gcc/ChangeLog
branches/ibm/gcc-4_1-branch/gcc/lambda-code.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30565



[Bug tree-optimization/24309] [4.1/4.2/4.3 Regression] ICE with -O3 -ftree-loop-linear

2007-10-10 Thread pthaugen at gcc dot gnu dot org


--- Comment #18 from pthaugen at gcc dot gnu dot org  2007-10-10 22:28 
---
I'm appear to be seeing the same thing trying to build 445.gobmk from cpu2006
on PowerPC.

run/build_base_gcc_64.> cat junk.c

int  board_size;

void print_eye(int eye[400])
{
  int m, n;
  int mini, maxi;
  int minj, maxj;

  mini = 1;
  minj = 1;
  for (m = 0; m < board_size; m++)
for (n = 0; n < board_size; n++) {
  if (eye[m  + n] != 5) continue;

  if (m < mini) mini = m;
  if (n < minj) minj = n;
}

  board_size = mini + minj;
}
run/build_base_gcc_64.> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m64
-O2 -ftree-loop-linear junk.c
junk.c: In function 'print_eye':
junk.c:5: internal compiler error: in lambda_loopnest_to_gcc_loopnest, at
lambda-code.c:1840
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

     CC|                    |pthaugen at gcc dot gnu dot
       ||org, bergner at gcc dot gnu
   ||dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24309



[Bug tree-optimization/33742] New: Segfault in vectorizable_operation

2007-10-11 Thread pthaugen at gcc dot gnu dot org
Seeing a segfault trying to build 164.gzip from cpu2000.  Noticed it with -O3,
but also occurs for "-O2/-O1 -ftree-vectorize".

run/0001> cat junk.c
typedef unsigned short ush;
extern ush prev[];
void fill_window()
{
register unsigned n, m;

for (n = 0; n < 32768; n++) {
m = prev[n];
prev[n] = (ush)(m >= 0x8000 ? m-0x8000 : 0);
}
}
run/0001> /home/pthaugen/install/gcc/trunk/bin/gcc -c -m32 -O3 junk.c
junk.c: In function 'fill_window':
junk.c:4: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


Program received signal SIGSEGV, Segmentation fault.
vectorizable_operation (stmt=0xf7f69860, bsi=0x0, vec_stmt=0x0, slp_node=0x0)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-transform.c:3807
3807  nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
(gdb) bt 5 
#0  vectorizable_operation (stmt=0xf7f69860, bsi=0x0, vec_stmt=0x0,
slp_node=0x0)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-transform.c:3807
#1  0x106efc68 in vect_analyze_operations (loop_vinfo=0x109d8d20)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-analyze.c:484
#2  0x106f7060 in vect_analyze_loop (loop=0xf7ec43f0)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vect-analyze.c:4341
#3  0x104d010c in vectorize_loops ()
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-vectorizer.c:2501
#4  0x1045e000 in tree_vectorize ()
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-loop.c:216
(More stack frames follow...)


-- 
   Summary: Segfault in vectorizable_operation
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33742



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-17 Thread pthaugen at gcc dot gnu dot org


--- Comment #11 from pthaugen at gcc dot gnu dot org  2007-10-17 16:14 
---
Created an attachment (id=14364)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14364&action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-17 Thread pthaugen at gcc dot gnu dot org


--- Comment #12 from pthaugen at gcc dot gnu dot org  2007-10-17 16:18 
---
And now some comments to go with the prior attatchment...

This checkin is causing a 75% degradation on leslie3d for PowerPC. As HJ
observed earlier, it depends on a second function accessing some of the global
data.

Generated code for simple copy loop in EXTRAPI(), compiled with -O2.

revision 126325:

.p2align 4,,15
.L3:
lfd 0,0(11)  #* ivtmp.71, tmp198
add 11,11,6  # ivtmp.71, ivtmp.71, ivtmp.77
stfd 0,0(9)  #* ivtmp.76, tmp198
add 9,9,5# ivtmp.76, ivtmp.76, ivtmp.73
bdnz .L3 #


revision 126326:

.p2align 4,,15
.L6:
lwz 10,12(27)# .stride, prephitmp.66
lwz 3,12(28) # .stride, prephitmp.56
.L3:
mullw 10,10,12   # tmp141, prephitmp.66, i
lwz 9,36(30) # .stride, .stride
lwz 0,36(31) # .stride, .stride
lwz 8,4(30)  # uav.offset, uav.offset
lwz 11,4(31) # qav.offset, qav.offset
lwz 6,24(30) # .stride, .stride
lwz 7,24(31) # .stride, .stride
lwz 4,0(30)  # uav.data, uav.data
lwz 5,0(31)  # qav.data, qav.data
mullw 3,3,12 # tmp160, prephitmp.56, i
slwi 9,9,1   # tmp142, .stride,
slwi 0,0,1   # tmp161, .stride,
add 9,9,8# tmp143, tmp142, uav.offset
add 0,0,11   # tmp162, tmp161, qav.offset
add 9,9,6# tmp145, tmp143, .stride
add 0,0,7# tmp164, tmp162, .stride
add 9,9,10   # tmp147, tmp145, tmp141
add 0,0,3# tmp166, tmp164, tmp160
slwi 9,9,3   # tmp148, tmp147,
slwi 0,0,3   # tmp167, tmp166,
addi 12,12,1 # i, i,
lfdx 0,5,0   #* qav.data, tmp169
cmpw 7,12,29 # imax.3, tmp170, i
stfdx 0,9,4  #* uav.data, tmp169
bne 7,.L6#


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu dot
   ||org, bergner at gcc dot gnu
   ||dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance

2008-08-21 Thread pthaugen at gcc dot gnu dot org
Seeing a degradation in cpu2000 benchmark 252.eon that is caused by
autovectorization of a simple loop in function ggSpectrum::Set(float).

Here's a simple C version.

void ggSpectrum_Set(float * data, float d) {
   int i;
   for (i = 0; i < 8; i++)
  data[i] = d;
}


When compiled with -O3 -mcpu=970 the following code is generated:

ggSpectrum_Set:
mfvrsave 0
stwu 1,-48(1)
stw 0,44(1)
oris 0,0,0x8000
mtvrsave 0
li 10,0
rlwinm 0,3,30,30,31
subfic 0,0,4
andi. 9,0,3
beq- 0,.L16
mtctr 9
.p2align 4,,15
.L10:
slwi 0,10,2
addi 10,10,1
stfsx 1,3,0
subfic 8,10,8
bdnz .L10
.L3:
subfic 6,9,8
srwi 0,6,2
slwi. 7,0,2
beq- 0,.L5
mtctr 0
stfs 1,16(1)
cmpwi 7,0,0
li 0,16
slwi 9,9,2
li 11,0
add 9,3,9
lvewx 0,1,0
vspltw 0,0,0
beq- 7,.L17
.p2align 4,,15
.L6:
slwi 0,11,4
addi 11,11,1
stvx 0,9,0
bdnz .L6
cmpw 7,6,7
subf 8,7,8
add 10,10,7
beq- 7,.L9
.L5:
mtctr 8
slwi 0,10,2
add 3,3,0
.p2align 4,,15
.L8:
stfs 1,0(3)
addi 3,3,4
bdnz .L8
.L9:
lwz 12,44(1)
mtvrsave 12
addi 1,1,48
blr
.L16:
mr 10,9
li 8,8
b .L3
.L17:
li 0,1
mtctr 0
b .L6


Adding -mno-altivec results in this simpler sequence, and a significant boost
in performance (~40% speedup for the benchmark):

ggSpectrum_Set:
stfs 1,28(3)
stfs 1,0(3)
stfs 1,4(3)
stfs 1,8(3)
stfs 1,12(3)
stfs 1,16(3)
stfs 1,20(3)
stfs 1,24(3)
blr


Another thing that stood out from the benchmark run was that the code was
taking a pretty big hit on a couple of the statically predicted branches
(apparently the address was already 16 byte aligned a lot of the time). So it
seems like it would be best to remove the static prediction and let the
hardware prediction take over.


-- 
   Summary: Autovectorization of constant iteration loop degrades
performance
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194



[Bug tree-optimization/34976] New: verify_ssa ICE with -ftree-loop-linear

2008-01-25 Thread pthaugen at gcc dot gnu dot org
Discovered the following trying to build the cpu2006 benchmark 454.calculix.

work/temp> cat umat_aniso_creep.f 
  subroutine umat_aniso_creep()
!
  real*8 gr(6,6)
!
!
 do i=1,6
do j=1,6
   gr(i,j)=0.d0
enddo
if(i.le.3) then
   gr(i,i)=1.d0
else
   gr(i,i)=0.5d0
endif
 enddo
 call dgesv(gr)
  return
  end
work/temp> ~/install/gcc/trunk/bin/gfortran -c -m32 -O2 -ftree-loop-linear
umat_aniso_creep.f
umat_aniso_creep.f: In function 'umat_aniso_creep':
umat_aniso_creep.f:1: error: definition in block 14 does not dominate use in
block 3
for SSA_NAME: i_15 in statement:
i_27 = PHI 
PHI argument
i_15
for PHI node
i_27 = PHI 
umat_aniso_creep.f:1: internal compiler error: verify_ssa failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


-- 
   Summary: verify_ssa ICE with -ftree-loop-linear
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34976



[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown

2009-02-02 Thread pthaugen at gcc dot gnu dot org


--- Comment #60 from pthaugen at gcc dot gnu dot org  2009-02-02 19:16 
---
I tried leslie3d on PPC. The alias-improvements branch does indeed seem to fix
the issue. The version of leslie3d built with the alias-improvements branch is
about 10% faster than a version built with trunk, and is equivalent to a trunk
version built with --param max-aliased-vops=1.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug debug/41295] [4.5 Regression] gfortran.dg/loc_2.f90 -O3 -g fails on SH with orphaned debug_insn

2009-09-09 Thread pthaugen at gcc dot gnu dot org


--- Comment #5 from pthaugen at gcc dot gnu dot org  2009-09-09 20:23 
---
Didn't mean to change the component when CC'ing myself, changing back.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

  Component|rtl-optimization|debug


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41295



[Bug fortran/41494] New: temp and memcpy used when zeroing array

2009-09-28 Thread pthaugen at gcc dot gnu dot org
The following is causing a 7% degradation on cpu2000 benchmark 191.fma3d. The
problem was introduced somewhere between revisions 151544 and 151586.

Compile the following with -m32 -O2.

  MODULE force_ 

  TYPE :: force_type
REAL(KIND(0D0))  Xext! X direction external force  
REAL(KIND(0D0))  Yext! Y direction external force  
REAL(KIND(0D0))  Zext! Z direction external force  
REAL(KIND(0D0))  Xint! X direction internal force  
REAL(KIND(0D0))  Yint! Y direction internal force  
REAL(KIND(0D0))  Zint! Z direction internal force  
  END TYPE  

  TYPE (force_type), DIMENSION(:), ALLOCATABLE :: FORCE 
  INTEGER  NUMRT

  END MODULE force_


  SUBROUTINE SOLVE

USE force_

  DO i = 1,NUMRT
FORCE(i) = force_type (0,0,0,0,0,0)
  ENDDO

  END


The latter revision now zeroes out a temp and uses that as the source on a
memcpy call.


rev 151544:

.L3:
stfd 0,0(9)
stfd 0,8(9)
stfd 0,16(9)
stfd 0,24(9)
stfd 0,32(9)
stfd 0,40(9)
addi 9,9,48
bdnz .L3



rev 151586:

.L3:
mr 3,30
mr 4,28
stfd 31,8(1)
stfd 31,16(1)
li 5,48
addi 31,31,1
stfd 31,24(1)
stfd 31,32(1)
stfd 31,40(1)
stfd 31,48(1)
bl memcpy
cmpw 7,31,29
addi 30,30,48
bne 7,.L3


-- 
   Summary: temp and memcpy used when zeroing array
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41494



[Bug target/42431] [4.5 Regression] wrong code for 200.sixtrack with vectorization and -fdata-sections

2010-02-12 Thread pthaugen at gcc dot gnu dot org


--- Comment #7 from pthaugen at gcc dot gnu dot org  2010-02-12 23:16 
---
This looks like an ira/reload bug.  The ira dump is where the offending insns
first show up.

(insn 17 26 244 3 pr42431.f:27 (set (mem:V4SI (reg/f:DI 29 29 [255]) [4 S16
A128])
(reg:V4SI 108 31 [269])) 942 {*altivec_movv4si} (expr_list:REG_EQUAL
(const_vector:V4SI [
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
])
(nil)))

(insn 244 17 245 3 pr42431.f:27 (set (reg:DI 9 9)
(mem/u/c:DI (plus:DI (reg:DI 2 2)
(const:DI (unspec:DI [
(symbol_ref/u:DI ("*.LC2") [flags 0x2])
] 49))) [7 S8 A8])) 373 {*movdi_internal64} (nil))

(insn 245 244 20 3 pr42431.f:27 (set (reg:DI 9 9)
(plus:DI (reg:DI 9 9)
(reg:DI 9 9))) 83 {*adddi3_internal1} (nil))

(insn 20 245 246 3 pr42431.f:27 (set (mem:V4SI (reg:DI 9 9) [4 S16 A128])
(reg:V4SI 108 31 [269])) 942 {*altivec_movv4si} (expr_list:REG_EQUAL
(const_vector:V4SI [
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
])
(nil)))


insns 244/245 are not present in the prior (sched1) dump. insn 245 should be
adding the constant 16 to gpr9 instead of adding r9.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC|    |pthaugen at gcc dot gnu dot
   |        |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42431



[Bug middle-end/39976] [4.5 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2009-11-30 Thread pthaugen at gcc dot gnu dot org


--- Comment #25 from pthaugen at gcc dot gnu dot org  2009-11-30 21:29 
---
I am still seeing the 2-block loop using revision 154838, both 32 and 64 bit,
compile options -O3 -mcpu=power6 -funroll-loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976



[Bug tree-optimization/42248] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns

2010-01-13 Thread pthaugen at gcc dot gnu dot org


--- Comment #3 from pthaugen at gcc dot gnu dot org  2010-01-13 22:11 
---
Created an attachment (id=19585)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19585&action=view)
Testcase

Adding a shortened executable testcase that still fails.

This looks like an aliasing issue. We're not creating a dependency between a
store and load of a memory location which then allows the scheduler to reorder
the insns.

(insn 2 9 3 2 test.c:10 (set (mem/s/c:DI (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x+0 S8 A64])
(reg:DI 3 3)) 373 {*movdi_internal64} (expr_list:REG_DEAD (reg:DI 3 3)
(nil)))
...
(insn 11 8 12 2 test.c:10 (set (reg:DF 126 [ x$a ])
(mem/s/j/c:DF (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x.a+0 S8 A64])) 360
{*movdf_hardfloat64} (nil))


Some debug data along the dependency creation/alias checking path:

Breakpoint 4, true_dependence (mem=0x4357068, mem_mode=DImode,
x=0x43571d0, 
vari...@0x112aa168: 0x106737c4 ) at
/home/pthaugen/src/gcc/trunk/gcc/gcc/alias.c:2367
2367  return rtx_refs_may_alias_p (x, mem, true);
(gdb) pr x
(mem/s/j/c:DF (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x.a+0 S8 A64])
(gdb) pr mem
(mem/s/c:DI (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x+0 S8 A64])

We then go through rtx_refs_may_alias_p() -> refs_may_alias_p_1() ->
decl_refs_may_alias_p() which contains the following code where we return
false.

  /* If both references are based on different variables, they cannot alias. 
*/
  if (!operand_equal_p (base1, base2, 0))
return false;

Breakpoint 7, operand_equal_p (arg0=0x4370110, arg1=0x4370660, flags=0)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/fold-const.c:3160
3160  if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK)

(gdb) prt arg0
 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x442ca20
fields 
DC file test.c line 2 col 19
size 
unit size 
align 64 offset_align 128
offset 
bit offset  context
 chain > context

pointer_to_this  chain >
used BLK file test.c line 9 col 14 size 
unit size 
align 64 context 
(mem/s/c:BLK (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x+0 S32 A64]) arg-type 
incoming-rtl (reg:DI 3 3 [ x+-8 ]) chain >
(gdb) prt arg1
 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x442ca20
fields 
DC file test.c line 2 col 19
size 
unit size 
align 64 offset_align 128
offset 
bit offset  context
 chain > context

pointer_to_this  chain >
used BLK file test.c line 9 col 14 size 
unit size 
align 64 context 
(mem/s/c:BLK (plus:DI (reg/f:DI 67 ap)
(const_int 48 [0x30])) [0 x+0 S32 A64]) arg-type 
incoming-rtl (reg:DI 3 3 [ x+-8 ]) chain >


In operand_equal_p() we get to the following which returns false:

case tcc_declaration:
  /* Consider __builtin_sqrt equal to sqrt.  */
  return (TREE_CODE (arg0) == FUNCTION_DECL
  && DECL_BUILT_IN (arg0) && DECL_BUILT_IN (arg1)
  && DECL_BUILT_IN_CLASS (arg0) == DECL_BUILT_IN_CLASS (arg1)
  && DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1));


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248



[Bug tree-optimization/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns

2010-01-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #11 from pthaugen at gcc dot gnu dot org  2010-01-14 15:45 
---
I'll test on PPC.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu dot
   |    |org
  Component|middle-end  |tree-optimization
   Target Milestone|4.5.0   |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248



[Bug middle-end/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns

2010-01-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #12 from pthaugen at gcc dot gnu dot org  2010-01-14 17:29 
---
The first patch appeared to work, resulting in correct ordering, but the second
patch had some issues. It also corrected the original ordering, but introduced
some new incorrect code for handling the args coming in in FPR regs.

Check:

.L.check:
mflr 0
std 0,16(1)
stdu 1,-128(1)
std 3,176(1)
std 4,184(1)
std 5,192(1)
std 6,200(1)
lfd 13,184(1)
lfd 0,176(1)
>>std 8,112(1)
>>lfd 12,112(1)
>>fcmpu 7,0,12
crnot 30,30
beq 7,.L4
>>fcmpu 7,13,1
beq 7,.L1

The first compare should be comparing FP12 against FP1, not sure how the GPR8
stuff got in there, nothing is even passed in GPR8.


Init:

.L.init:
std 5,0(3)
stfd 1,8(3)
blr

Should just be storing FPR1/FPR2 into consecutive locations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248



[Bug middle-end/42248] [4.5 Regression] compat test struct-by-value-17 fails execution with -O1 -fschedule-insns

2010-01-17 Thread pthaugen at gcc dot gnu dot org


--- Comment #15 from pthaugen at gcc dot gnu dot org  2010-01-17 16:54 
---
Updated patch works for testcase and bootstrap/regtest on PPC passed.  Thx.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42248



[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails

2010-07-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #2 from pthaugen at gcc dot gnu dot org  2010-07-14 15:11 
---
Created an attachment (id=21200)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21200&action=view)
dump file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932



[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails

2010-07-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #3 from pthaugen at gcc dot gnu dot org  2010-07-14 15:12 
---
Created an attachment (id=21201)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21201&action=view)
dump file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932



[Bug testsuite/42855] FAIL: gcc.dg/tree-ssa/pr42585.c scan-tree-dump-times optimized *

2010-07-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #4 from pthaugen at gcc dot gnu dot org  2010-07-14 18:18 
---
Based on the last post in the patch thread should the patch be committed so the
testsuite failures go away and this can be closed?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42855



[Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression

2010-07-16 Thread pthaugen at gcc dot gnu dot org


--- Comment #33 from pthaugen at gcc dot gnu dot org  2010-07-16 19:14 
---
gcc.dg/tree-ssa/loop-19.c started failing on powerpc with -m64 between 7/5 and
7/7. The tree dump now looks like the following:

:
  ivtmp.10_12 = (long unsigned int) &a[-1];
  ivtmp.16_15 = (long unsigned int) &c[-1];
  a.21_18 = (long unsigned int) &a;
  D.2035_19 = a.21_18 + 1592;

:
  # ivtmp.10_9 = PHI 
  # ivtmp.16_13 = PHI 
  ivtmp.10_5 = ivtmp.10_9 + 8;
  D.2032_16 = (void *) ivtmp.10_5;
  D.2007_3 = MEM[(double[200] *)D.2032_16];
  ivtmp.16_14 = ivtmp.16_13 + 8;
  D.2033_17 = (void *) ivtmp.16_14;
  MEM[(double[200] *)D.2033_17] = D.2007_3;
  if (ivtmp.10_5 != D.2035_19)
goto ;
  else
goto ;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression

2010-07-21 Thread pthaugen at gcc dot gnu dot org


--- Comment #39 from pthaugen at gcc dot gnu dot org  2010-07-21 21:51 
---
(In reply to comment #38)
> 
> .L2:
> addi 11,8,9216
> ldx 0,10,9
> stdx 0,11,9
> addi 9,9,8
> bdnz .L2
> 
> and in r161844:
> 
> .L2:
> ldu 0,8(11)
> stdu 0,8(9)
> bdnz .L2
> 
> I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like 
> a
> win to me.  Bit-rotten test case?
> 

The 'addi 11,8,9216' in the first loop is invariant and should be hoisted out
of the loop. Separate issue?

As for the issue of indexed ld/st+addi vs. update-form ld/st. The update forms
are cracked into ld/st+addi which imposes a scheduling restriction on them
(cracked insns start a dispatch group). May not make any difference in this
simple loop, but indexed ld/st+addi may have better scheduling opportunities
were there more insns in the loop.

This testcase also appears to be dependent on -mcpu value. Specifying
-mcpu=power7 the testcase passes (although there's still the issue of invariant
addi in the loop).  And if I change to use -m32, then it only fails for
-mcpu=power6.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/45663] [4.6 regression] New test failures

2010-09-14 Thread pthaugen at gcc dot gnu dot org


--- Comment #3 from pthaugen at gcc dot gnu dot org  2010-09-14 20:04 
---
Not sure I understand everything involved here, but isn't the test a little
suspect any time higher optimization levels and instruction scheduling are
enabled?


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.6.0   |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45663



[Bug tree-optimization/45671] New: Reassociate expressions for greater parallelism

2010-09-14 Thread pthaugen at gcc dot gnu dot org
The following testcase demonstrates where reassociation/regrouping of
expressions could result in greater parallelism for processors that have
multiple arithmetic execution units.

int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) {
  int ret;

  ret = a + b + c + d + e + f + g + h;
  return ret;

}

Compiling with -O3 results in a series of dependent add instructions to
accumulate the sum.

add 4,3,4
add 4,4,5
add 4,4,6
add 4,4,7
add 4,4,8
add 4,4,9
add 4,4,10


If we regrouped to (a+b)+(c+d)+... we can do multiple adds in parallel on
different execution units.


-- 
   Summary: Reassociate expressions for greater parallelism
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45671



[Bug fortran/45648] [4.6 regression] Unnecessary temporary for transpose calls as actual argument.

2010-09-20 Thread pthaugen at gcc dot gnu dot org


--- Comment #3 from pthaugen at gcc dot gnu dot org  2010-09-20 20:00 
---
As Steven mentioned in the mailing list, this did introduce a degradation for
cpu2000 benchmark galgel. I'm seeing about -10% on PowerPC.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu dot
   |    |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45648



[Bug fortran/45648] [4.6 regression] Unnecessary temporary for transpose calls as actual argument.

2010-09-20 Thread pthaugen at gcc dot gnu dot org


--- Comment #4 from pthaugen at gcc dot gnu dot org  2010-09-20 20:45 
---
(In reply to comment #2)
> Created an attachment (id=21777)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21777&action=view) [edit]
> patch restoring the previous behaviour. 
> 

Applying this patch restored the performance.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45648



[Bug target/44199] ppc64 glibc miscompilation

2010-05-20 Thread pthaugen at gcc dot gnu dot org


--- Comment #9 from pthaugen at gcc dot gnu dot org  2010-05-20 16:23 
---
Spec testing went fine, differences in the noise range.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199



[Bug target/44199] ppc64 glibc miscompilation

2010-05-20 Thread pthaugen at gcc dot gnu dot org


--- Comment #11 from pthaugen at gcc dot gnu dot org  2010-05-20 17:59 
---
No I didn't bootstrap/regtest. I can take care of trunk if you want to do 4.4.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199



[Bug target/44199] ppc64 glibc miscompilation

2010-05-20 Thread pthaugen at gcc dot gnu dot org


--- Comment #13 from pthaugen at gcc dot gnu dot org  2010-05-21 02:32 
---
Bootstrap of trunk went fine. Regression test (contrib/test_summary) showed the
following difference between base/patched versions, but didn't have any
testcases show up in the diff, so not sure what to make of that. This is for
32-bit gcc testsuite.

< # of expected passes  59078
---
> # of expected passes  59075
98c98
< # of unsupported tests872
---
> # of unsupported tests875


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199



[Bug target/44199] ppc64 glibc miscompilation

2010-05-26 Thread pthaugen at gcc dot gnu dot org


--- Comment #22 from pthaugen at gcc dot gnu dot org  2010-05-26 15:51 
---
The 4.4 patch isn't complete.

/home/gccbuild/gcc_4.4_anonsvn/gcc/gcc/config/rs6000/rs6000.c:17166: undefined
reference to `offset_below_red_zone_p'
/home/gccbuild/gcc_4.4_anonsvn/gcc/gcc/config/rs6000/rs6000.c:17188: undefined
reference to `offset_below_red_zone_p'


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44199



[Bug testsuite/44701] [4.6 regression] PR44492 fix broke gcc.target/powerpc/asm-es-2.c

2010-07-12 Thread pthaugen at gcc dot gnu dot org


--- Comment #4 from pthaugen at gcc dot gnu dot org  2010-07-12 21:03 
---
Adding '<>' to the "=m" constraint fixes the testcase, but adding a single '>'
(or '<') results in an error for impossible constraints. This is caused by the
following snippet that was added to reload1.c:

  switch (GET_CODE (XEXP (recog_data.operand[opno], 0)))
{
case PRE_INC:
case POST_INC:
case PRE_DEC:
case POST_DEC:
case PRE_MODIFY:
case POST_MODIFY:
  if (strchr (recog_data.constraints[opno], '<') ==
NULL
  || strchr (recog_data.constraints[opno], '>')
 == NULL)
return 0;
          break;

Should the code be using '&&' instead of '||'?


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC|                    |jakub at gcc dot gnu dot
   |        |org, pthaugen at gcc dot gnu
   ||dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44701



[Bug testsuite/44701] [4.6 regression] PR44492 fix broke gcc.target/powerpc/asm-es-2.c

2010-07-12 Thread pthaugen at gcc dot gnu dot org


--- Comment #5 from pthaugen at gcc dot gnu dot org  2010-07-12 21:05 
---
Sorry, recog.c is where the prior code snippet came from.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44701



[Bug testsuite/44932] New: gcc.dg/uninit-pred-9_b.c fails

2010-07-13 Thread pthaugen at gcc dot gnu dot org
Subject testcase fails on powerpc64.

FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 24)


Compiling standalone I see the following:

pthaugen/work> ~/install/gcc/trunk/bin/gcc -O2 -S -m32 -Wuninitialized
~/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
/home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In
function 'foo':
/home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:24:11:
warning: 'v' may be used uninitialized in this function [-Wuninitialized]
/home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In
function 'foo_2':
/home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:41:11:
warning: 'v' may be used uninitialized in this function [-Wuninitialized]


-- 
   Summary: gcc.dg/uninit-pred-9_b.c fails
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-22 Thread pthaugen at gcc dot gnu dot org


--- Comment #27 from pthaugen at gcc dot gnu dot org  2007-10-22 21:11 
---
I tried a recent mainline on PowerPC for leslie3d, revision 129550 improved
over revision 129454 by 67%.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-23 Thread pthaugen at gcc dot gnu dot org


--- Comment #29 from pthaugen at gcc dot gnu dot org  2007-10-23 20:29 
---
Found another example on PowerPC in the same benchmark that is not fixed by
the checked in patches.  Compiled with -m32 -O2. From the loop in procedure
FLUXI:

revision 126325:

.L47:
lfd 0,0(9)   #* ivtmp.277, tmp346
lfd 13,-8(9) #, tmp347
add 9,9,0# ivtmp.277, ivtmp.277, ivtmp.309
fsub 0,0,13  # tmp345, tmp346, tmp347
fmul 0,0,12  # tmp348, tmp345, dtvol.23
fneg 0,0 # tmp349, tmp348
stfd 0,0(11) #* ivtmp.306, tmp349
add 11,11,10 # ivtmp.306, ivtmp.306, ivtmp.280
bdnz .L47#



revision 126326 (and current mainline):

.L83:
lwz 0,48(7)  # .stride, .stride
lfd 0,0(10)  #* ivtmp.277, tmp416
lfd 13,-8(10)#, tmp417
lfd 12,0(5)  # dtvol, dtvol
add 10,10,6  # ivtmp.277, ivtmp.277, ivtmp.306
mullw 0,8,0  # tmp409, l, .stride
fsub 0,0,13  # tmp415, tmp416, tmp417
lwz 9,4(7)   # du.offset, du.offset
lwz 11,0(7)  # du.data, du.data
fmul 0,0,12  # tmp420, tmp415, dtvol
fneg 0,0 # tmp422, tmp420
add 0,0,9# tmp412, tmp409, du.offset
addi 8,8,1   # l, l,
slwi 0,0,3   # tmp413, tmp412,
stfdx 0,11,0 #, tmp422
bdnz .L83#


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-23 Thread pthaugen at gcc dot gnu dot org


--- Comment #30 from pthaugen at gcc dot gnu dot org  2007-10-23 20:30 
---
Created an attachment (id=14402)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14402&action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-24 Thread pthaugen at gcc dot gnu dot org


--- Comment #35 from pthaugen at gcc dot gnu dot org  2007-10-24 23:15 
---
I reran leslie3d on PowerPC with --param max-aliased-vops=1. The result was
a 90% improvement over my prior revision 129550 run (218% improvement over the
original mainline run).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug middle-end/32429] [PPC, missing optimization] stack space not optimized when stack not used

2007-10-31 Thread pthaugen at gcc dot gnu dot org


--- Comment #1 from pthaugen at gcc dot gnu dot org  2007-10-31 19:05 
---
Looks like the -fpic/PIC/pie/PIE issue was fixed on mainline with the following
patch.

2007-09-04  Daniel Jacobowitz  <[EMAIL PROTECTED]>

* config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the
GOT pointer only if there is a constant pool.  Use the allocated space
for SPE also.


See bug 28966 for the -maltivec fix.


-- 

pthaugen at gcc dot gnu dot org changed:

   What|Removed |Added

 CC|    |pthaugen at gcc dot gnu dot
   |        |org, drow at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32429



[Bug target/28966] -maltivec -m32 causes the stack to be saved and restored even though there is no need for it

2007-10-31 Thread pthaugen at gcc dot gnu dot org


--- Comment #4 from pthaugen at gcc dot gnu dot org  2007-10-31 19:48 
---
Subject: Bug 28966

Author: pthaugen
Date: Wed Oct 31 19:48:19 2007
New Revision: 129806

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129806
Log:
2007-10-01  Pat Haugen  <[EMAIL PROTECTED]>

Backport the following patches:

2007-09-04  Daniel Jacobowitz  <[EMAIL PROTECTED]>

* config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the
GOT pointer only if there is a constant pool.  Use the allocated space
for SPE also.

2006-12-21  Nathan Sidwell  <[EMAIL PROTECTED]>

PR target/28966
PR target/29248
* reload1.c (reload): Realign stack after it changes size.


Modified:
branches/ibm/gcc-4_1-branch/gcc/ChangeLog
branches/ibm/gcc-4_1-branch/gcc/config/rs6000/rs6000.c
branches/ibm/gcc-4_1-branch/gcc/reload1.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28966



[Bug target/29248] Stack pointer is modified in functions that don't use the stack

2007-10-31 Thread pthaugen at gcc dot gnu dot org


--- Comment #7 from pthaugen at gcc dot gnu dot org  2007-10-31 19:48 
---
Subject: Bug 29248

Author: pthaugen
Date: Wed Oct 31 19:48:19 2007
New Revision: 129806

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129806
Log:
2007-10-01  Pat Haugen  <[EMAIL PROTECTED]>

Backport the following patches:

2007-09-04  Daniel Jacobowitz  <[EMAIL PROTECTED]>

* config/rs6000/rs6000.c (rs6000_stack_info): Allocate space for the
GOT pointer only if there is a constant pool.  Use the allocated space
for SPE also.

2006-12-21  Nathan Sidwell  <[EMAIL PROTECTED]>

PR target/28966
PR target/29248
* reload1.c (reload): Realign stack after it changes size.


Modified:
branches/ibm/gcc-4_1-branch/gcc/ChangeLog
branches/ibm/gcc-4_1-branch/gcc/config/rs6000/rs6000.c
branches/ibm/gcc-4_1-branch/gcc/reload1.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29248



[Bug target/34087] New: ICE in regrename.c for movdf_hardfloat64_mfpgpr

2007-11-13 Thread pthaugen at gcc dot gnu dot org
The following attatchment from the cpu20006 benchmark leslie3d ICEs in the
following manner. The problem appears to be a latent bug that has come and gone
a couple times (I saw it earlier on the gamess benchmark too).  Recent
mainlines compile it fine, but revision 128829 exhibits the problem.  Opening a
bugzilla since I haven't been able to spend as much time digging into it as I'd
thought/hoped, and don't want to lose track of it.

> gfortran -c -m64 -O2 -mcpu=power6x -fpeel-loops -ffast-math junk.f
junk.f: In function 'setiv':
junk.f:913: error: insn does not satisfy its constraints:
(insn 5218 2288 4792 191 (set (reg:DF 46 14 [2631])
(reg:DF 65 lr [1528])) 335 {*movdf_hardfloat64_mfpgpr}
(expr_list:REG_DEAD (reg:DF 65 lr [1528])
(nil)))
junk.f:913: internal compiler error: in build_def_use, at regrename.c:766
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


-- 
   Summary: ICE in regrename.c for movdf_hardfloat64_mfpgpr
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34087



[Bug target/34087] ICE in regrename.c for movdf_hardfloat64_mfpgpr

2007-11-13 Thread pthaugen at gcc dot gnu dot org


--- Comment #1 from pthaugen at gcc dot gnu dot org  2007-11-13 21:08 
---
Created an attachment (id=14549)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14549&action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34087



[Bug tree-optimization/34976] verify_ssa ICE with -ftree-loop-linear

2008-04-18 Thread pthaugen at gcc dot gnu dot org


--- Comment #4 from pthaugen at gcc dot gnu dot org  2008-04-18 14:47 
---
Is the workaround still appropriate, and if so, can it go in 4.3 and trunk now?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34976



[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown

2008-04-30 Thread pthaugen at gcc dot gnu dot org


--- Comment #43 from pthaugen at gcc dot gnu dot org  2008-04-30 18:49 
---
Created an attachment (id=15553)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15553&action=view)
Testcase

I tried a mainline with the latest patch.  While we no longer have problems
with the prior testcases, there is no improvement for leslie3d on ppc64. I can
still double the performance of the benchmark by specifying --param
max-aliased-vops=1.

Including a new trimmed down testcase from the benchmark where I'm still seeing
poor code when max-aliased-vops is not increased, compiled with 'gfortran -m32
-O2'.

Refer to the first nested loop in procedure FLUXK():

DO I = I1, I2
   QS(I) = WAV(I,J,K) * ZAREA
END DO


Base:

.L150:
lwz 0,24(18) # .stride, .stride
lwz 9,36(18) # .stride, .stride
lwz 11,12(18)# .stride, .stride
lwz 10,4(18) # wav.offset, wav.offset
mullw 0,17,0 # tmp660, ivtmp.602, .stride
lwz 8,0(18)  # wav.data, wav.data
mullw 9,30,9 # tmp666, ivtmp.590, .stride
mullw 11,6,11# tmp670, i, .stride
add 0,0,9# tmp672, tmp660, tmp666
addi 6,6,1   # i, i,
add 0,0,11   # tmp673, tmp672, tmp670
add 0,0,10   # tmp674, tmp673, wav.offset
slwi 0,0,3   # tmp676, tmp674,
lfdx 0,8,0   #, tmp678
fmul 0,0,8   # tmp679, tmp678, zarea.64
stfdx 0,15,7 #* ivtmp.589, tmp679
addi 7,7,8   # ivtmp.589, ivtmp.589,
bdnz .L150   #



With --param max-aliased-vops=1000:

.L150:
lfd 0,0(11)  #* ivtmp.599, tmp739
add 11,11,30 # ivtmp.599, ivtmp.599, D.2783
fmul 0,0,8   # tmp740, tmp739, zarea.64
stfdx 0,22,9 #* ivtmp.602, tmp740
addi 9,9,8   # ivtmp.602, ivtmp.602,
bdnz .L150   #


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown

2008-04-30 Thread pthaugen at gcc dot gnu dot org


--- Comment #46 from pthaugen at gcc dot gnu dot org  2008-04-30 22:35 
---
Just following up with comments posted on irc.  The patch does fix the problem
I was seeing.  Spec ratio improved from 5.18 to 9.07 with the patch (75%), not
quite the full 100% improvement I was seeing with --param
max-aliased-vops=1, but getting awfully close.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/32921] [4.3/4.4 Regression] Revision 126326 causes 12% slowdown

2008-05-01 Thread pthaugen at gcc dot gnu dot org


--- Comment #47 from pthaugen at gcc dot gnu dot org  2008-05-01 19:26 
---
Created an attachment (id=15557)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15557&action=view)
Testcase

Found some more cases, which are contributing to the missing 25%. Looks like
similar code, but obviously something different about it since it's not getting
caught with the latest patch.  Compiled with -O2 -m64.

>From first nested loop in FLUXI():

DO I = I1,I2
   QS(I) = UAV(I,J,K) * XAREA
END DO


Base:

.L185:
ld 10,168(1) # pretmp.1065,
ld 4,424(1)  # D.1331,
sldi 11,9,3  # tmp667, i,
mulld 0,10,9 # tmp669,, i
add 0,0,25   # tmp670, tmp669, ivtmp.1093
addi 9,9,1   # tmp675, i,
sldi 0,0,3   # tmp671, tmp670,
lfdx 0,4,0   #, tmp673
extsw 9,9# i.1073, tmp675
fmul 0,0,28  # tmp674, tmp673, xarea.63
stfdx 0,26,11#, tmp674
bdnz .L185   #


With --param max-aliased-vops=1:

.L185:
lfd 0,0(11)  #* ivtmp.886, tmp667
add 11,11,18 # ivtmp.886, ivtmp.886, D.3744
fmul 0,0,27  # tmp668, tmp667, xarea.63
stfdx 0,31,9 #* ivtmp.889, tmp668
addi 9,9,8   # ivtmp.889, ivtmp.889,
cmpd 7,9,29  # tmp787, tmp673, ivtmp.889
bne 7,.L185  #


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug tree-optimization/36922] New: ICE in tree-data-ref.c with -ftree-loop-linear

2008-07-24 Thread pthaugen at gcc dot gnu dot org
Current trunk is ICE'ing when trying to build CPU2006 benchmark 416.gamess with
-ftree-loop-linear.

run/build_base_gcc_64.0001> cat junk.f
  SUBROUTINE GETCOF2(NP,FLM,ZLL,COEFF)
C
  IMPLICIT DOUBLE PRECISION(A-H,O-Z)
  PARAMETER (MAXCOF=23821)
  DIMENSION COEFF(MAXCOF),
 *  ZLL(0:2*NP+1),FLM(0:2*NP)
C
  LENG=0
  DO L=0,NP
 DO M=0,L
DO LK=M,L
   LENG=LENG+1
   COEFF(LENG)=FLM(L+M)*FLM(L-M)*ZLL(L-LK)/(FLM(LK+M)*
 * FLM(LK-M)*FLM(L-LK)*FLM(L-LK))
ENDDO
 ENDDO
  ENDDO
C
  RETURN
  END

run/build_base_gcc_64.0001> /home/pthaugen/install/gcc/trunk/bin/gfortran -c
-O2 -ftree-loop-linear junk.f
junk.f: In function 'getcof2':
junk.f:1: internal compiler error: in initialize_matrix_A, at
tree-data-ref.c:1885
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


-- 
   Summary: ICE in tree-data-ref.c with -ftree-loop-linear
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36922