[Bug tree-optimization/30735] 50% slow down due to mem-ssa merge
--- Comment #5 from pthaugen at us dot ibm dot com 2007-02-15 23:36 --- Seeing similar behavior on PPC for CPU2000. Comparing revisions 119759 and 119760, 179.art degrades 24% and 200.sixtrack degrades 50%. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735
[Bug tree-optimization/30562] [4.3 Regression] remove unused variable is removing a referenced variable (in STORED_SYMS or LOADED_SYMS)
--- Comment #9 from pthaugen at us dot ibm dot com 2007-03-20 15:31 --- Looks like I can reproduce with mainline using -O2 -ftree-loop-linear when building galgel benchmark from cpu2000. (My FORTRAN skills are lacking, so couldn't whittle down to a single testcase, but got close) 178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64 -ffixed-form -O2 -ftree-loop-linear modules.f90 178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64 -ffixed-form -O2 -ftree-loop-linear sysnsL.f90 sysnsL.f90: In function #sysnsl#: sysnsL.f90:6: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. (gdb) run Starting program: /home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64-linux/4.3.0/f951 sysnsL.f90 -quiet -dumpbase sysnsL.f90 -m64 -auxbase sysnsL -O2 -version -ffixed-form -ftree-loop-linear -fintrinsic-modules-path /home/pthaugen/install/gcc/trunk/lib/gcc/powerpc64-linux/4.3.0/finclude -o sysnsL.s GNU F95 version 4.3.0 20070314 (experimental) (powerpc64-linux) compiled by GNU C version 4.1.0 (SUSE Linux). GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Program received signal SIGSEGV, Segmentation fault. remove_referenced_var (var=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771 771 ggc_free (*loc); (gdb) bt 5 #0 remove_referenced_var (var=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771 #1 0x103d1238 in remove_unused_locals () at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-live.c:518 #2 0x1026f204 in execute_function_todo (data=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:865 #3 0x1026ee44 in do_per_function (callback=0x1026efe0 , data=0x21) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:757 #4 0x1026ef4c in execute_todo (flags=33) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:935 (More stack frames follow...) -- pthaugen at us dot ibm dot com changed: What|Removed |Added CC| |pthaugen at us dot ibm dot | |com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30562
[Bug tree-optimization/31280] New: segfault in remove_referenced_var
Seeing the following with mainline using -O2 -ftree-loop-linear when building galgel benchmark from cpu2000. I couldn't whittle down to a single testcase so will attatch both source files. 178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64 -ffixed-form -O2 -ftree-loop-linear modules.f90 178.galgel/run> /home/pthaugen/install/gcc/trunk/bin/gfortran -c -m64 -ffixed-form -O2 -ftree-loop-linear sysnsL.f90 sysnsL.f90: In function #sysnsl#: sysnsL.f90:6: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. (gdb) run Starting program: /home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64-linux/4.3.0/f951 sysnsL.f90 -quiet -dumpbase sysnsL.f90 -m64 -auxbase sysnsL -O2 -version -ffixed-form -ftree-loop-linear -fintrinsic-modules-path /home/pthaugen/install/gcc/trunk/lib/gcc/powerpc64-linux/4.3.0/finclude -o sysnsL.s GNU F95 version 4.3.0 20070314 (experimental) (powerpc64-linux) compiled by GNU C version 4.1.0 (SUSE Linux). GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Program received signal SIGSEGV, Segmentation fault. remove_referenced_var (var=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771 771 ggc_free (*loc); (gdb) bt 5 #0 remove_referenced_var (var=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-dfa.c:771 #1 0x103d1238 in remove_unused_locals () at /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-live.c:518 #2 0x1026f204 in execute_function_todo (data=) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:865 #3 0x1026ee44 in do_per_function (callback=0x1026efe0 , data=0x21) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:757 #4 0x1026ef4c in execute_todo (flags=33) at /home/pthaugen/src/gcc/trunk/gcc/gcc/passes.c:935 (More stack frames follow...) -- Summary: segfault in remove_referenced_var Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280
[Bug tree-optimization/31280] segfault in remove_referenced_var
--- Comment #1 from pthaugen at us dot ibm dot com 2007-03-20 15:53 --- Having trouble attatching source files, will keep trying... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280
[Bug tree-optimization/30562] [4.3 Regression] remove unused variable is removing a referenced variable (in STORED_SYMS or LOADED_SYMS)
--- Comment #11 from pthaugen at us dot ibm dot com 2007-03-20 15:54 --- 31280 opened. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30562
[Bug tree-optimization/31280] segfault in remove_referenced_var
--- Comment #3 from pthaugen at us dot ibm dot com 2007-03-20 18:09 --- Created an attachment (id=13238) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13238&action=view) Fortran source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280
[Bug tree-optimization/31280] segfault in remove_referenced_var
--- Comment #4 from pthaugen at us dot ibm dot com 2007-03-20 18:10 --- Created an attachment (id=13239) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13239&action=view) Fortran source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31280
[Bug tree-optimization/31343] ICE in data-refs dependence testing
--- Comment #2 from pthaugen at us dot ibm dot com 2007-03-25 18:38 --- I hit the same ICE building the 191.fma3d benchmark from cpu2000 when compiling lsold.f90 with -O2 -ftree-loop-linear. -- pthaugen at us dot ibm dot com changed: What|Removed |Added CC||pthaugen at us dot ibm dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31343
[Bug middle-end/32940] REG_POINTER attribute on DECL_ARTIFICIAL pointers
--- Comment #5 from pthaugen at us dot ibm dot com 2007-08-03 15:30 --- This patch gives us an additional 2-3% overall on CPU2000, running on Power6. Facerec was the big winner with about 40% improvement, there were a few improvements in the 5-10% range and a few degradations in the 1-2% range that could be attributed to other issues like loop alignment or possibly an issue with r0 getting assigned to the base reg by renaming which I'm currently testing a patch for. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32940
[Bug regression/20372] New: Reversed branch-on-count loop (DoLoop)
Noticed a regression relative to 4.0 on the code generated for one of the loops in zaxpy.f of the SPEC wupwise benchark. The loop is transformed into a branch-on-count loop, but the target on the branch is the loop exit (return in this case) instead of the loop start. The 'bct' is immediately followed by an unconditional branch to the loop start. Here's a snippet of the generated code (this is for the first loop in the source, even though it appears as the second loop in the generated code). 'L18' is the epilog block. .L15: addi %r0,%r7,-1 # D.511, ix, ... ... stfdx %f11,%r11,%r31 # (* zy), SR.12 bdz .L18 b .L15 -- Summary: Reversed branch-on-count loop (DoLoop) Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc*-*-* GCC host triplet: powerpc*-*-* GCC target triplet: powerpc*-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372
[Bug regression/20372] Reversed branch-on-count loop (DoLoop)
--- Additional Comments From pthaugen at us dot ibm dot com 2005-03-07 21:59 --- Created an attachment (id=8357) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8357&action=view) Source file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372
[Bug rtl-optimization/20372] [4.0 Regression] Reversed branch-on-count loop (DoLoop)
--- Additional Comments From pthaugen at us dot ibm dot com 2005-03-09 20:55 --- I only noticed the regression on 4.1, things look fine to me on 4.0 (for both zaxpy.f and Andrew's C example). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20372
[Bug tree-optimization/20470] New: Branching sequence generated for ABS(x-y)
Noticed that gcc/g++ generate a branching sequence instead of straight-line code for expressions of the form ABS(x-y). Changing the operator to something else (+, *, /) resutls in straight-line code. #define ABS(value) ( (value)>=0 ? (value) : -(value) ) int i,j,k,l,m,n; void f1() { l = ABS(m+n); i = ABS(j-k); } -- Summary: Branching sequence generated for ABS(x-y) Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc*-*-* GCC host triplet: powerpc*-*-* GCC target triplet: powerpc*-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470
[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)
--- Additional Comments From pthaugen at us dot ibm dot com 2005-03-14 19:13 --- I noticed this in the twolf benchmark of SPEC, dimbox.c:new_dbox_a(). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470
[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)
--- Additional Comments From pthaugen at us dot ibm dot com 2005-03-14 22:29 --- Decided to look into this myself, currently testing a patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470
[Bug tree-optimization/20470] Branching sequence generated for ABS(x-y)
--- Additional Comments From pthaugen at us dot ibm dot com 2005-03-15 18:22 --- Ditto here, except added additional call to negate_expr_p(arg2) to make sure it was ok to flip the expression back to it's original form. Not sure if that was necessary, was assuming I'd get comments when I posted a patch (hopefully soon, had some problems with testing). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20470
[Bug tree-optimization/18431] Code for arrays and pointers are not the same
--- Additional Comments From pthaugen at us dot ibm dot com 2005-06-02 18:46 --- Does this need to be reopened? I see the following for mainline. -m32: .L4: slwi %r9,%r11,1 #, tmp130, i addi %r11,%r11,1 # i, i, sthx %r0,%r9,%r10#* q.1, tmp132 bdnz .L4 # -m64: .L4: sthx %r0,%r11,%r10 #* ivtmp.11, tmp132 addi %r10,%r10,2 # ivtmp.11, ivtmp.11, bdnz .L4 # Both can be improved to use a non-indexed store and increment of that base reg, similar to what Zdenek said he saw in comment #9. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18431
[Bug other/22067] New: Inconsistent multiply by immediate
PR17103 states that mulli is better than decomposing into shift/add sequence. Following is example where we are being inconsistent about that decision. Compiled with gcc -O2 -mcpu=power4 -m32 struct S { int i1,i2,i3,i4,i5,i6; }s[10]; int y; int test1(int j, int x) { y = y * 24;// shift/sub s[j].i1 = 1; // mulli return (x * 24); // mulli } -- Summary: Inconsistent multiply by immediate Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22067
[Bug other/22068] New: Multiply-immediate opportunity
Examples where we could use mulli instead of li/mulld. The array indexing example shows up in the bzip2 benchmark (when compiled with -m64). Compiled with gcc -O2 -m64 struct S { int i1,i2,i3,i4,i5,i6; }s[10]; long test1(int j, long x) { s[j].i1 = 1; return (x * 24); } -- Summary: Multiply-immediate opportunity Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: other AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22068
[Bug tree-optimization/18768] New: Missed ivopts opportunity
Opening bug report per Zdeneck's request, snippets of email exchange follows: = void f1 (void * coefPtr, double * dd) { int i,j; /* Cast of "coefPtr" results in poor code for this loop (missed strength reduction). */ for (i = 0; i < 16; i++) { *(((double *) coefPtr) + i) = +0.0; } for (j = 0; j < 16; j++) { *(dd + j) = +0.0; } } = > this seems to be a problem with the cost function: > > Cost of strength reduction of the access = > Cost for incrementing the new induction variable: 8 > Cost for increased register pressure: 4 > Cost for the memory reference: 1 > > Cost for expressing the access using i = > Cost for multiplication by 8: 12 > Cost for the memory reference: 1 (addition of the result of >multiplications takes place in the address). > > The result is that it is not worthwhile to perform strength reduction. > I will try to do something with the code that estimates cost of memory > references, since it is quite wrong here. could you please create a bugreport for this? The things are a bit more complicated than what I expected; there is actually no way how the target could let ivopts know that DFmode address for (reg + reg) is more expensive than just reg, in the current state. -- Summary: Missed ivopts opportunity Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18768
[Bug tree-optimization/18768] Missed ivopts opportunity
--- Additional Comments From pthaugen at us dot ibm dot com 2004-12-02 21:30 --- Here's another example of the same thing, minus the cast/double issue. This one is strength reduced for 3.4 but not for 4.0. typedef struct { union { unsigned int Init[9]; unsigned char Link[37]; } u; } BitStreamLinkStruct; void f2(BitStreamLinkStruct *); void f1 () { int ixBlock; BitStreamLinkStruct bitStreamLink; for(ixBlock=0; ixBlock<=7; ixBlock++){ bitStreamLink.u.Init[ixBlock] = 0x1e1e1e1e; } f2(&bitStreamLink); } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18768
[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc
--- Comment #32 from pthaugen at us dot ibm dot com 2006-12-05 16:12 --- Another example, pared down from ammp benchmark in cpu2000. void f2(int *, int *); void mm_fv_update_nonbon(void) { int j, nx; int naybor[27]; f2(naybor, &nx); for(j=0; j< 27; j++) if( naybor[j]) break; /* Indexed load problem */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690
[Bug middle-end/28690] [4.2/4.3 Regression] Performace problem with indexed load/stores on powerpc
--- Comment #33 from pthaugen at us dot ibm dot com 2006-12-05 16:30 --- My prior comment is missing the closing bracket for the procedure, but example is otherwise complete. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28690
[Bug fortran/30113] New: ICE in trunc_int_for_mode
The following Fortran code, from the gamess CPU2006 benchmark, fails with an ICE on the 4.1 branch when compiled with -m32 -O3. SUBROUTINE RD2PDM(INFILE,TPDM,GBUF,LABS,NINTMX,IPIN) C IMPLICIT DOUBLE PRECISION(A-H,O-Z) LOGICAL READMORE INTEGER LABS(*),IPIN(*) DOUBLE PRECISION TPDM(*),GBUF(*) COMMON /PCKLAB/ LABSIZ C C CALL SEQREW(INFILE) READMORE = .TRUE. DO WHILE (READMORE) NG = 0 CALL PREAD(INFILE,GBUF,LABS,NG,NINTMX) IF (NG.LE.0) READMORE = .FALSE. MG = IABS(NG) DO M = 1, MG C C UNPACK LABELS C NPACK = M IF (LABSIZ .EQ. 2) THEN LABEL = LABS( 2*NPACK - 1 ) IPACK = ISHFT( LABEL, -16 ) JPACK = IAND( LABEL, 65535 ) LABEL = LABS( 2*NPACK ) KPACK = ISHFT( LABEL, -16 ) LPACK = IAND( LABEL, 65535 ) ELSE IF (LABSIZ .EQ. 1) THEN LABEL = LABS(NPACK) IPACK = ISHFT( LABEL, -24 ) JPACK = IAND( ISHFT( LABEL, -16 ), 255 ) KPACK = IAND( ISHFT( LABEL, -8 ), 255 ) LPACK = IAND( LABEL, 255 ) END IF I = IPACK J = JPACK K = KPACK L = LPACK IJ = IPIN(MAX(I,J)) + MIN(I,J) KL = IPIN(MAX(K,L)) + MIN(K,L) IJKL = IPIN(MAX(IJ,KL)) + MIN(IJ,KL) TPDM(IJKL) = GBUF(M) END DO ! CURRENT BATCH END DO ! NEXT BATCH RETURN END temp> gfortran -c -O3 -m32 gamess-failure.f gamess-failure.f: In function #rd2pdm#: gamess-failure.f:45: internal compiler error: in trunc_int_for_mode, at explow.c:54 Please submit a full bug report, -- Summary: ICE in trunc_int_for_mode Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pthaugen at us dot ibm dot com GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30113