[Bug tree-optimization/30735] 50% slow down due to mem-ssa merge

2007-02-15 Thread pthaugen at us dot ibm dot com


--- Comment #5 from pthaugen at us dot ibm dot com  2007-02-15 23:36 ---
Seeing similar behavior on PPC for CPU2000. Comparing revisions 119759 and
119760, 179.art degrades 24% and 200.sixtrack degrades 50%.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735



[Bug tree-optimization/30562] [4.3 Regression] remove unused variable is removing a referenced variable (in STORED_SYMS or LOADED_SYMS)

2007-03-20 Thread pthaugen at us dot ibm dot com


--- Comment #9 from pthaugen at us dot ibm dot com  2007-03-20 15:31 ---
Looks like I can reproduce with mainline using -O2 -ftree-loop-linear when
building galgel benchmark from cpu2000.

(My FORTRAN skills are lacking, so couldn't whittle down to a single testcase,
but got close)

178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64
-ffixed-form -O2 -ftree-loop-linear modules.f90
178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64
-ffixed-form -O2 -ftree-loop-linear sysnsL.f90
sysnsL.f90: In function #sysnsl#:
sysnsL.f90:6: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

(gdb) run
Starting program:
/home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64-linux/4.3.0/f951
sysnsL.f90 -quiet -dumpbase sysnsL.f90 -m64 -auxbase sysnsL -O2 -version
-ffixed-form -ftree-loop-linear -fintrinsic-modules-path
/home/pthaugen/install/gcc/trunk/lib/gcc/powerpc64-linux/4.3.0/finclude -o
sysnsL.s
GNU F95 version 4.3.0 20070314 (experimental) (powerpc64-linux)
compiled by GNU C version 4.1.0 (SUSE Linux).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096

Program received signal SIGSEGV, Segmentation fault.
remove_referenced_var (var=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771
771   ggc_free (*loc);
(gdb) bt 5
#0  remove_referenced_var (var=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771
#1  0x103d1238 in remove_unused_locals ()
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-live.c:518
#2  0x1026f204 in execute_function_todo (data=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:865
#3  0x1026ee44 in do_per_function (callback=0x1026efe0 ,
data=0x21)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:757
#4  0x1026ef4c in execute_todo (flags=33) at
/home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:935
(More stack frames follow...)


-- 

pthaugen at us dot ibm dot com changed:

   What|Removed |Added

 CC|        |pthaugen at us dot ibm dot
   |        |com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30562



[Bug tree-optimization/31280] New: segfault in remove_referenced_var

2007-03-20 Thread pthaugen at us dot ibm dot com
Seeing the following with mainline using -O2 -ftree-loop-linear when
building galgel benchmark from cpu2000.

I couldn't whittle down to a single testcase so will attatch both source files.

178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64
-ffixed-form -O2 -ftree-loop-linear modules.f90
178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64
-ffixed-form -O2 -ftree-loop-linear sysnsL.f90
sysnsL.f90: In function #sysnsl#:
sysnsL.f90:6: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

(gdb) run
Starting program:
/home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64-linux/4.3.0/f951
sysnsL.f90 -quiet -dumpbase sysnsL.f90 -m64 -auxbase sysnsL -O2 -version
-ffixed-form -ftree-loop-linear -fintrinsic-modules-path
/home/pthaugen/install/gcc/trunk/lib/gcc/powerpc64-linux/4.3.0/finclude -o
sysnsL.s
GNU F95 version 4.3.0 20070314 (experimental) (powerpc64-linux)
compiled by GNU C version 4.1.0 (SUSE Linux).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096

Program received signal SIGSEGV, Segmentation fault.
remove_referenced_var (var=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771
771   ggc_free (*loc);
(gdb) bt 5
#0  remove_referenced_var (var=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771
#1  0x103d1238 in remove_unused_locals ()
at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-live.c:518
#2  0x1026f204 in execute_function_todo (data=)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:865
#3  0x1026ee44 in do_per_function (callback=0x1026efe0 ,
data=0x21)
at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:757
#4  0x1026ef4c in execute_todo (flags=33) at
/home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:935
(More stack frames follow...)


-- 
   Summary: segfault in remove_referenced_var
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: pthaugen at us dot ibm dot com
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280



[Bug tree-optimization/31280] segfault in remove_referenced_var

2007-03-20 Thread pthaugen at us dot ibm dot com


--- Comment #1 from pthaugen at us dot ibm dot com  2007-03-20 15:53 ---
Having trouble attatching source files, will keep trying...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280



[Bug tree-optimization/30562] [4.3 Regression] remove unused variable is removing a referenced variable (in STORED_SYMS or LOADED_SYMS)

2007-03-20 Thread pthaugen at us dot ibm dot com


--- Comment #11 from pthaugen at us dot ibm dot com  2007-03-20 15:54 
---
31280 opened.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30562



[Bug tree-optimization/31280] segfault in remove_referenced_var

2007-03-20 Thread pthaugen at us dot ibm dot com


--- Comment #3 from pthaugen at us dot ibm dot com  2007-03-20 18:09 ---
Created an attachment (id=13238)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13238&action=view)
Fortran source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280



[Bug tree-optimization/31280] segfault in remove_referenced_var

2007-03-20 Thread pthaugen at us dot ibm dot com


--- Comment #4 from pthaugen at us dot ibm dot com  2007-03-20 18:10 ---
Created an attachment (id=13239)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13239&action=view)
Fortran source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280



[Bug tree-optimization/31343] ICE in data-refs dependence testing

2007-03-25 Thread pthaugen at us dot ibm dot com


--- Comment #2 from pthaugen at us dot ibm dot com  2007-03-25 18:38 ---
I hit the same ICE building the 191.fma3d benchmark from cpu2000 when compiling
lsold.f90 with -O2 -ftree-loop-linear.


-- 

pthaugen at us dot ibm dot com changed:

   What|Removed |Added

 CC||pthaugen at us dot ibm dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31343



[Bug middle-end/32940] REG_POINTER attribute on DECL_ARTIFICIAL pointers

2007-08-03 Thread pthaugen at us dot ibm dot com


--- Comment #5 from pthaugen at us dot ibm dot com  2007-08-03 15:30 ---
This patch gives us an additional 2-3% overall on CPU2000, running on Power6.
Facerec was the big winner with about 40% improvement, there were a few
improvements in the 5-10% range and a few degradations in the 1-2% range that
could be attributed to other issues like loop alignment or possibly an issue
with r0 getting assigned to the base reg by renaming which I'm currently
testing a patch for.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32940



[Bug regression/20372] New: Reversed branch-on-count loop (DoLoop)

2005-03-07 Thread pthaugen at us dot ibm dot com
Noticed a regression relative to 4.0 on the code generated for one of the loops
in zaxpy.f of the SPEC wupwise benchark. The loop is transformed into a
branch-on-count loop, but the target on the branch is the loop exit (return in
this case) instead of the loop start. The 'bct' is immediately followed by an
unconditional branch to the loop start.

Here's a snippet of the generated code (this is for the first loop in the
source, even though it appears as the second loop in the generated code). 'L18'
is the epilog block.

.L15:

addi %r0,%r7,-1  # D.511, ix,
...
...
stfdx %f11,%r11,%r31 # (* zy), SR.12
bdz .L18
b .L15

-- 
   Summary: Reversed branch-on-count loop (DoLoop)
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at us dot ibm dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc*-*-*
  GCC host triplet: powerpc*-*-*
GCC target triplet: powerpc*-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372


[Bug regression/20372] Reversed branch-on-count loop (DoLoop)

2005-03-07 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-03-07 
21:59 ---
Created an attachment (id=8357)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8357&action=view)
Source file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372


[Bug rtl-optimization/20372] [4.0 Regression] Reversed branch-on-count loop (DoLoop)

2005-03-09 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-03-09 
20:55 ---
I only noticed the regression on 4.1, things look fine to me on 4.0 (for both
zaxpy.f and Andrew's C example).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372


[Bug tree-optimization/20470] New: Branching sequence generated for ABS(x-y)

2005-03-14 Thread pthaugen at us dot ibm dot com
Noticed that gcc/g++ generate a branching sequence instead of straight-line code
for expressions of the form ABS(x-y). Changing the operator to something else
(+, *, /) resutls in straight-line code.


#define ABS(value)   ( (value)>=0 ? (value) : -(value) )
int i,j,k,l,m,n;

void f1()
{
  l = ABS(m+n);
  i = ABS(j-k);
}

-- 
   Summary: Branching sequence generated for ABS(x-y)
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at us dot ibm dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc*-*-*
  GCC host triplet: powerpc*-*-*
GCC target triplet: powerpc*-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470


[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)

2005-03-14 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-03-14 
19:13 ---
I noticed this in the twolf benchmark of SPEC, dimbox.c:new_dbox_a().



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470


[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)

2005-03-14 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-03-14 
22:29 ---
Decided to look into this myself, currently testing a patch.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470


[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)

2005-03-15 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-03-15 
18:22 ---
Ditto here, except added additional call to negate_expr_p(arg2) to make sure it
was ok to flip the expression back to it's original form. Not sure if that was
necessary, was assuming I'd get comments when I posted a patch (hopefully soon,
had some problems with testing).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470


[Bug tree-optimization/18431] Code for arrays and pointers are not the same

2005-06-02 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2005-06-02 
18:46 ---
Does this need to be reopened?  I see the following for mainline.

  -m32:
.L4:
slwi %r9,%r11,1  #, tmp130, i
addi %r11,%r11,1 # i, i,
sthx %r0,%r9,%r10#* q.1, tmp132
bdnz .L4 #



  -m64:
.L4:
sthx %r0,%r11,%r10   #* ivtmp.11, tmp132
addi %r10,%r10,2 # ivtmp.11, ivtmp.11,
bdnz .L4 #


Both can be improved to use a non-indexed store and increment of that base reg,
similar to what Zdenek said he saw in comment #9.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18431


[Bug other/22067] New: Inconsistent multiply by immediate

2005-06-14 Thread pthaugen at us dot ibm dot com
PR17103 states that mulli is better than decomposing into shift/add sequence.
Following is example where we are being inconsistent about that decision.

Compiled with gcc -O2 -mcpu=power4 -m32


struct S {
  int i1,i2,i3,i4,i5,i6;
}s[10];

int y;

int test1(int j, int x)
{

 y = y * 24;// shift/sub
 s[j].i1 = 1;   // mulli

 return (x * 24);   // mulli
}

-- 
   Summary: Inconsistent multiply by immediate
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at us dot ibm dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22067


[Bug other/22068] New: Multiply-immediate opportunity

2005-06-14 Thread pthaugen at us dot ibm dot com
Examples where we could use mulli instead of li/mulld. The array indexing
example shows up in the bzip2 benchmark (when compiled with -m64).

Compiled with gcc -O2 -m64

struct S {
  int i1,i2,i3,i4,i5,i6;
}s[10];

long test1(int j, long x)
{

 s[j].i1 = 1;

 return (x * 24);
}

-- 
   Summary: Multiply-immediate opportunity
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: other
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at us dot ibm dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22068


[Bug tree-optimization/18768] New: Missed ivopts opportunity

2004-12-01 Thread pthaugen at us dot ibm dot com
Opening bug report per Zdeneck's request, snippets of email exchange follows:

=
void f1 (void * coefPtr, double * dd)
{
  int  i,j;

  /* Cast of "coefPtr" results in poor code for this loop (missed strength
 reduction). */
  for (i = 0; i < 16; i++) {
*(((double *) coefPtr) + i) = +0.0;
  }

  for (j = 0; j < 16; j++) {
*(dd + j) = +0.0;
  }
}

=

> this seems to be a problem with the cost function:
>
> Cost of strength reduction of the access =
>  Cost for incrementing the new induction variable:  8
>  Cost for increased register pressure: 4
>  Cost for the memory reference: 1
>
> Cost for expressing the access using i =
>  Cost for multiplication by 8:  12
>  Cost for the memory reference: 1 (addition of the result of
>multiplications takes place in the address).
>
> The result is that it is not worthwhile to perform strength reduction.
> I will try to do something with the code that estimates cost of memory
> references, since it is quite wrong here.

could you please create a bugreport for this?  The things are a bit more
complicated than what I expected; there is actually no way how the
target could let ivopts know that DFmode address for (reg + reg) is more
expensive than just reg, in the current state.

-- 
   Summary: Missed ivopts opportunity
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at us dot ibm dot com
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18768


[Bug tree-optimization/18768] Missed ivopts opportunity

2004-12-02 Thread pthaugen at us dot ibm dot com

--- Additional Comments From pthaugen at us dot ibm dot com  2004-12-02 
21:30 ---
Here's another example of the same thing, minus the cast/double issue. This one
is strength reduced for 3.4 but not for 4.0.

typedef struct {
  union {
unsigned int Init[9];
unsigned char Link[37];
  } u;
} BitStreamLinkStruct;
void f2(BitStreamLinkStruct *);

void f1 ()
{
  int  ixBlock;
  BitStreamLinkStruct bitStreamLink;

  for(ixBlock=0; ixBlock<=7; ixBlock++){
bitStreamLink.u.Init[ixBlock] = 0x1e1e1e1e;
  }

  f2(&bitStreamLink);

}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18768


[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-05 Thread pthaugen at us dot ibm dot com


--- Comment #32 from pthaugen at us dot ibm dot com  2006-12-05 16:12 
---
Another example, pared down from ammp benchmark in cpu2000. 

void f2(int *, int *);
void mm_fv_update_nonbon(void)
{
 int j, nx;
 int naybor[27];

 f2(naybor, &nx);

 for(j=0; j< 27; j++)
 if( naybor[j]) break; /* Indexed load problem */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc

2006-12-05 Thread pthaugen at us dot ibm dot com


--- Comment #33 from pthaugen at us dot ibm dot com  2006-12-05 16:30 
---
My prior comment is missing the closing bracket for the procedure, but example
is otherwise complete.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690



[Bug fortran/30113] New: ICE in trunc_int_for_mode

2006-12-07 Thread pthaugen at us dot ibm dot com
The following Fortran code, from the gamess CPU2006 benchmark, fails with an
ICE on the 4.1 branch when compiled with -m32 -O3.


  SUBROUTINE RD2PDM(INFILE,TPDM,GBUF,LABS,NINTMX,IPIN)
C
  IMPLICIT DOUBLE PRECISION(A-H,O-Z)
  LOGICAL READMORE
  INTEGER LABS(*),IPIN(*)
  DOUBLE PRECISION  TPDM(*),GBUF(*)
  COMMON /PCKLAB/ LABSIZ
C
C  CALL SEQREW(INFILE)
  READMORE = .TRUE.
  DO WHILE (READMORE)
NG = 0
CALL PREAD(INFILE,GBUF,LABS,NG,NINTMX)
IF (NG.LE.0) READMORE = .FALSE.
MG = IABS(NG)
DO M = 1, MG
C
C  UNPACK LABELS
C
   NPACK = M
   IF (LABSIZ .EQ. 2) THEN
 LABEL = LABS( 2*NPACK - 1 )
 IPACK = ISHFT( LABEL, -16 )
 JPACK = IAND(  LABEL, 65535 )
 LABEL = LABS( 2*NPACK )
 KPACK = ISHFT( LABEL, -16 )
 LPACK = IAND(  LABEL, 65535 )
   ELSE IF (LABSIZ .EQ. 1) THEN
 LABEL = LABS(NPACK)
 IPACK = ISHFT( LABEL, -24 )
 JPACK = IAND( ISHFT( LABEL, -16 ), 255 )
 KPACK = IAND( ISHFT( LABEL,  -8 ), 255 )
 LPACK = IAND( LABEL, 255 )
   END IF
   I = IPACK
   J = JPACK
   K = KPACK
   L = LPACK
  IJ = IPIN(MAX(I,J)) + MIN(I,J)
  KL = IPIN(MAX(K,L)) + MIN(K,L)
  IJKL = IPIN(MAX(IJ,KL)) + MIN(IJ,KL)
  TPDM(IJKL) = GBUF(M)
END DO  ! CURRENT BATCH 
  END DO  ! NEXT BATCH
  RETURN
  END


temp> gfortran -c -O3 -m32 gamess-failure.f
gamess-failure.f: In function #rd2pdm#:
gamess-failure.f:45: internal compiler error: in trunc_int_for_mode, at
explow.c:54
Please submit a full bug report,


-- 
   Summary: ICE in trunc_int_for_mode
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pthaugen at us dot ibm dot com
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30113