[Bug regression/47037] New: 465.tonto Segmentation Fault in memset

2010-12-21 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

   Summary: 465.tonto Segmentation Fault in memset
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: changpeng.f...@amd.com


We have a new system and I saw a segmentation fault in memset (with current gcc
trunk).

gfortran -O2 -static (-march=generic64):

(gdb) r < stdin
Starting program:
/local/home/chfang/cpu2006/benchspec/CPU2006/465.tonto/build/build_base_cfangO3./tonto
< stdin

Program received signal SIGSEGV, Segmentation fault.
memset () at ../sysdeps/x86_64/memset.S:496
496../sysdeps/x86_64/memset.S: No such file or directory.
in ../sysdeps/x86_64/memset.S
Current language:  auto
The current source language is "auto; currently asm".
(gdb) bt
#0  memset () at ../sysdeps/x86_64/memset.S:496
#1  0x2020202020202020 in ?? ()
#2  0x2020202020202020 in ?? ()
#3  0x2020202020202020 in ?? ()
#4  0x2020202020202020 in ?? ()
#5  0x2020202020202020 in ?? ()
#6  0x2020202020202020 in ?? ()
#7  0x2020202020202020 in ?? ()
#8  0x2020202020202020 in ?? ()
#9  0x2020202020202020 in ?? ()
#10 0x2020202020202020 in ?? ()
#11 0x2020202020202020 in ?? ()
#12 0x2020202020202020 in ?? ()
#13 0x2020202020202020 in ?? ()
#14 0x2020202020202020 in ?? ()
#15 0x2020202020202020 in ?? ()
#16 0x2020202020202020 in ?? ()
#17 0x2020202020202020 in ?? ()
#18 0x00b613a0 in ?? ()
#19 0x2020202020202020 in ?? ()
#20 0x000120202020 in ?? ()
#21 0x20202021 in ?? ()
#22 0x2020202020202020 in ?? ()
#23 0x00b60eb0 in ?? ()
#24 0x00b60b20 in ?? ()
#25 0x0080 in ?? ()
#26 0x0001 in ?? ()
#27 0x00411d6c in read_label (self=...) at atom.fppized.f90:1155
#28 0x00415318 in process_keyword (self=..., keyword=, _keyword=-11776) at atom.fppized.f90:1028
#29 0x00415b74 in process_keys (self=...) at atom.fppized.f90:1440
#30 0x0042d689 in data_length (self=) at
atomvec.fppized.f90:1388
#31 0x0042ed3a in read_data (self=..., ignore_braces=Cannot access
memory at address 0x0
) at atomvec.fppized.f90:1351
#32 0x004314f8 in read_list_keywords (self=...) at
atomvec.fppized.f90:1306
#33 0x006256e5 in read_atoms (self=...) at mol.fppized.f90:9579
#34 0x00647d3e in process_keyword (self=0xb5b490, keyword="atom",
_keyword=) at mol_main.fppized.f90:3836
#35 0x00648418 in read_keywords (self=0xb5b490) at
mol_main.fppized.f90:3807
#36 0x00648489 in main (self=0xb5b490) at mol_main.fppized.f90:3744
#37 0x006b9ea2 in run_mol () at run_mol.fppized.f90:125
#38 main () at run_mol.fppized.f90:22
#39 0x in ?? ()


[Bug regression/47037] 465.tonto Segmentation Fault in memset

2010-12-21 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

--- Comment #1 from Changpeng Fang  2010-12-22 
00:55:35 UTC ---
Initially I thought it is a glibc bug, but seems it is not:

(1) A workaround flag is -fno-caller-saves
(2) The compile binary (NOTE: with -static) runs correctly on other systems

The bad code is in atom.fppized:

 subroutine set_label_and_atomic_number(self,label)
type(atom_type) :: self
!Set an type(atom_type) "label" and extract the atomic number from it.
  character(*) :: label
  integer(kind=kind(1)) :: lensym,z
  character(128) :: symbol
  logical(kind=kind(.true.)) :: error

  self%label = label


The memset is for the label copy:

.LBB633:
.loc 1 967 0 discriminator 2
movq%r13, %rdx
movq%rbx, %rsi
movq%rsp, %rdi
callmemcpy
movl$128, %edx
leaq(%rsp,%r13), %rdi ## < bad address
movl$32, %esi
subq%r13, %rdx
movq%rsp, %r12
callmemset
jmp .L707
.LVL646:
.p2align 4,,10
.p2align 3
.L717:


Looks like %rsp value is not correct (stack corrupted).


[Bug regression/47037] 465.tonto Segmentation Fault in memset with -fcaller-saves (default at -O2 or higher)

2010-12-23 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

Changpeng Fang  changed:

   What|Removed |Added

Summary|465.tonto Segmentation  |465.tonto Segmentation
   |Fault in memset |Fault in memset with
   ||-fcaller-saves (default at
   ||-O2 or higher)

--- Comment #3 from Changpeng Fang  2010-12-23 
18:05:02 UTC ---
.LBB633:
.loc 1 967 0 discriminator 2
movq%r13, %rdx
movq%rbx, %rsi
movq%rsp, %rdi
callmemcpy
movl$128, %edx
leaq(%rsp,%r13), %rdi ## < bad address
movl$32, %esi
subq%r13, %rdx
movq%rsp, %r12
callmemset
jmp .L707
.LVL646:
.p2align 4,,10
.p2align 3


Actually, the segfault is in copying label to symbol at line 967:

character(128) :: symbol
symbol = label(1:lensym)

The memset is to set the remainder of the 128 bytes to ZEROs. The local code
seems
good to me. It might be that the %rsp is not appropriately set. Anyway, it is
not likely to be a fortran bug because it only occurs at -O2 or higher when
-fcaller-saves is turned on,


[Bug regression/47037] 465.tonto Segmentation Fault in memset with -fcaller-saves (default at -O2 or higher)

2010-12-23 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

--- Comment #4 from Changpeng Fang  2010-12-23 
18:08:44 UTC ---
(In reply to comment #2)
> Can you supply a simplified test case?
> 

The difficulty is that the bug only shows up on a new AMD system (bobcat). The
compiled binary on bobcat can run correctly on other systems.


[Bug regression/47037] 465.tonto Segmentation Fault in memset with -fcaller-saves (default at -O2 or higher)

2011-01-03 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

--- Comment #6 from Changpeng Fang  2011-01-03 
21:59:44 UTC ---
(In reply to comment #5)
> Does your glibc have CPU specific optimizations?

I don't think so. 

The OS is SLES 11, SP1. The machine (bobcat) indeed does not
support some instructions that K8 supports.

gcc (4.3, coming up with the systems) works fine.


[Bug regression/47037] 465.tonto Segmentation Fault in memset with -fcaller-saves (default at -O2 or higher)

2011-01-03 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47037

--- Comment #8 from Changpeng Fang  2011-01-03 
22:30:22 UTC ---

> 
> Which instructions are missing in Bobcat?

At least 3DNow instructions.


[Bug c/48487] New: Multiple Definition of Labels in Inlining Assembler

2011-04-06 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48487

   Summary: Multiple Definition of Labels in Inlining Assembler
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: changpeng.f...@amd.com


Created attachment 23904
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23904
labels.c

For the attached test case labels.c
gcc -O2 -c labels.c

labels.c: Assembler messages:
labels.c:9: Error: symbol `done' is already defined

The compilation is good with -O0 or -O1.

Apparently, after inlining, the label should be rebamed.

Note that this is not a regression since it occurred in gcc4, 5, 6, 7


[Bug target/49089] New: Regression on CFP2006 on Bulldozer From Splitting AVX 32-byte Unaligned Loads

2011-05-20 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089

   Summary: Regression on CFP2006 on Bulldozer From Splitting AVX
32-byte Unaligned Loads
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: changpeng.f...@amd.com


The regression is caused by the following patch that splits AVX 32-byte
unaligned load and store:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01839.html

Here is the performance impact on a Bulldozer System:

  store-split load-split
410.bwaves0.48  -0.48
416.gamess0.55   0.00
433.milc1.76  -3.96
434.zeusmp3.48  -3.48
435.gromacs0.51   1.54
436.cactusADM-0.72  -0.72
437.leslie3d10.33  -0.94
444.namd1.03   0.00
447.dealII0.70  -1.41
450.soplex0.79   0.40
453.povray-0.50   -0.50
454.calculix5.07   -1.84
459.GemsFDTD4.33   -6.25
465.tonto1.270.00
470.lbm-0.861.44
481.wrf1.35-3.59
482.sphinx30.00-2.11
geomean1.71-1.31

While splitting store is good, Bulldozer seems not like unaligned
load splitting.


[Bug target/49089] Regression on CFP2006 on Bulldozer From Splitting AVX 32-byte Unaligned Loads

2011-05-20 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089

--- Comment #1 from Changpeng Fang  2011-05-20 
18:01:29 UTC ---
Apparently, this default option setting should only apply to systems that
splitting loads is bebeficial:

config/i386/i386.c(ix86_option_override_internal):

if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
  target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;


[Bug tree-optimization/43657] [4.3/4.4/4.5/4.6 Regression] -ftree-loop-linear causes FAIL: gcc.dg/vect/vect-cond-5.c execution test

2010-10-19 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43657

--- Comment #4 from Changpeng Fang  2010-10-19 
21:27:46 UTC ---
 for (k = 0; k < 32; k++)
{
  res = 0;
  for (j = 0; j < 32; j++) 
for (i = 0; i < 32; i++)
  { 
next = a[i][j]; 
res = c > cond_array[i+k][j] ? next : res;
  }

  out[k] = res;
}


gcc interchanges i and j loops, which is not legal in this case.
Apparently, res takes the last value of a[i][j] that satisfies the 
condition c > cond_array[i+k][j]. As a result, change in the 
reference order will get a different value for res.

Anyone knows where to do this legality check?

What about the interchange in Graphite for this case?


[Bug tree-optimization/44503] "control flow in the middle of basic block" with -fprefetch-loop-arrays

2010-10-19 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44503

Changpeng Fang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED

--- Comment #5 from Changpeng Fang  2010-10-19 
21:54:27 UTC ---
This bug is no longer valid for the current gcc trunk. It is
possibly fixed by Honza's CFG work related to leaf attribute.

Anyway, I close this bug.


[Bug middle-end/45270] CPU2006 435.gromacs: Segmentation fault with -fprofile-generate

2010-10-29 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45270

Changpeng Fang  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||WORKSFORME

--- Comment #2 from Changpeng Fang  2010-10-29 
16:56:06 UTC ---
This bug does not happen in the current 4.6 trunk. So I am closing it.


[Bug tree-optimization/45022] No prefetch for the vectorized loop

2010-11-01 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45022

Changpeng Fang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED

--- Comment #6 from Changpeng Fang  2010-11-01 
22:21:56 UTC ---
Not surprise: Richard's check-in of the MISALIGNED_INDIRECT_REF removal
fixed the bug.

Surprise: only two prefetches are generated (which is right). However,
if we could align the references as the following case (PR 45021):

float a[1024], b[1024];
void foo(int beta)
{
  int i;
  for(i=0; i<1024; i++)
 a[i] = a[i] + beta * b[i];
}

Three prefetches will be generated, one for b, one for load a, and one for
store a.

Anyway, I am closing this bug, and we should work on PR 45021.


[Bug rtl-optimization/46793] New: -fschedule-insns causes ICE in compiling zlib/trees.c

2010-12-03 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46793

   Summary: -fschedule-insns causes ICE in compiling zlib/trees.c
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: changpeng.f...@amd.com


-fschedule-insns causes ICE in compiling zlib/trees.c: 
trees.c:701:1: error: unable to find a register to spill in class ‘Q_REGS’
This failure only occurs for -m32 compilation.

To reproduce:
cd trunk/zlib
gcc trees.c -c -o ~/tree.o -O2 -fschedule-insns -m32

[~/gcc-4.6-20101120/zlib]$ gcc trees.c -c -o ~/tree.o -O2 -fschedule-insns -m32
trees.c: In function ‘build_tree’:
trees.c:701:1: error: unable to find a register to spill in class ‘Q_REGS’
trees.c:701:1: error: this is the insn:
(insn 473 429 164 18 (set (subreg:SI (reg:QI 83 [ iftmp.7 ]) 0)
(if_then_else:SI (ltu (reg:CC 17 flags)
(const_int 0 [0]))
(subreg:SI (reg:QI 388) 0)
(subreg:SI (reg:QI 83 [ iftmp.7 ]) 0))) trees.c:677 854
{*movsicc_noc}
 (expr_list:REG_DEAD (reg:QI 388)
(expr_list:REG_DEAD (reg:CC 17 flags)
(nil
trees.c:701:1: internal compiler error: in spill_failure, at reload1.c:2105
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

This should be a regression (I only know 4.3 is working)


[Bug rtl-optimization/46793] -fschedule-insns causes ICE in compiling zlib/trees.c

2010-12-03 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46793

--- Comment #2 from Changpeng Fang  2010-12-03 
20:52:11 UTC ---
This bug should be fixed.

We are trying to make -fschedule-insns default for x86, and this enablement
causes
bootstrapping failure at this point.

Of course we can work around this by just enabling it for 64 bit.


[Bug rtl-optimization/46793] -fschedule-insns causes ICE in compiling zlib/trees.c

2010-12-03 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46793

--- Comment #4 from Changpeng Fang  2010-12-03 
21:24:06 UTC ---
Thanks, I understand the issue now.
Yes, -fschedule-insns and -fsched-pressure should be a paired options for
x86. -fsched-pressure does not solve the -m32 issue.


[Bug rtl-optimization/46829] ICE: in spill_failure, at reload1.c:2105 with -fschedule-insns -fsched-pressure and variadic function

2010-12-06 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829

Changpeng Fang  changed:

   What|Removed |Added

 CC||changpeng.fang at amd dot
   ||com

--- Comment #3 from Changpeng Fang  2010-12-06 
22:41:29 UTC ---
Actually, I have opened bug 46793 last week, and it seems to be the same bug.
Bug 46793 has been marked as WONTFIX.


[Bug fortran/46842] New: 465.tonto test run miscompares (even with -O0)

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46842

   Summary: 465.tonto test run miscompares (even with -O0)
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: changpeng.f...@amd.com


Tests are performed based 4.6 trunk @ 167317 on a linux64 system. 456.tonto
test run miscompares @ any options. 

train and ref runs are fine.

LINK: gfortran   -O0  -DSPEC_CPU_LP64   -o options
C: LD="gfortran"
O: FOPTIMIZE="-O0"
P: PORTABILITY="-DSPEC_CPU_LP64"
C: LDOUT="-o options"

Build successes: 465.tonto(base)

Setting Up Run Directories
  Setting up 465.tonto test base cfang default: existing
(run_base_test_cfang.)
Running Benchmarks
  Running 465.tonto test base cfang default
/home/chfang/cpu2006/bin/specinvoke -d
/home/chfang/cpu2006/benchspec/CPU2006/465.tonto/run/run_base_test_cfang.
-e speccmds.err -o speccmds.stdout -f speccmds.cmd -r -C
/home/chfang/cpu2006/bin/specinvoke -E -d
/home/chfang/cpu2006/benchspec/CPU2006/465.tonto/run/run_base_test_cfang.
-c 1 -e compare.err -o compare.stdout -f compare.cmd

*** Miscompare of stdout; for details see
   
/home/chfang/cpu2006/benchspec/CPU2006/465.tonto/run/run_base_test_cfang./stdout.mis
Error: 1x465.tonto


[Bug rtl-optimization/46829] ICE: in spill_failure, at reload1.c:2105 with -fschedule-insns -fsched-pressure and variadic function

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829

--- Comment #7 from Changpeng Fang  2010-12-07 
22:42:26 UTC ---
If you compile with dispatch scheduling enabled (together with -march=bdver1),
both test cases attached PASS.

gcc -O2 -fschedule-insns -fsched-pressure -mdispatch-scheduler -march=bdver1


[Bug fortran/46842] 465.tonto test run miscompares (even with -O0)

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46842

--- Comment #2 from Changpeng Fang  2010-12-07 
23:12:23 UTC ---
(In reply to comment #1)
> It does not seem to affect all SPEC testers as Rev. 167317 is already quite
> some old (= 2010-11-30) and I do not have seen any other report. (Cf. for
> instance http://gcc.gnu.org/ml/gcc-testresults/2010-12/msg00579.html)

This bug is filed against test data set. train and ref data sets are fine.

> I assume it worked before - can you find out which version caused the
> regression? Or do some other narrowing down of the problem?
The last working re I observed is 162788 (very very old).
The first bad rev I observed   is 166096 ( very old)


[Bug fortran/46842] 465.tonto test run miscompares (even with -O0)

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46842

Changpeng Fang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE

--- Comment #4 from Changpeng Fang  2010-12-08 
00:50:31 UTC ---
Yes, a duplicate. A fortran frontend bug?

*** This bug has been marked as a duplicate of bug 46506 ***


[Bug middle-end/46506] GCC miscompiled 465.tonto in SPEC CPU 2006

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46506

Changpeng Fang  changed:

   What|Removed |Added

 CC||changpeng.fang at amd dot
   ||com

--- Comment #8 from Changpeng Fang  2010-12-08 
00:50:31 UTC ---
*** Bug 46842 has been marked as a duplicate of this bug. ***


[Bug middle-end/46506] GCC miscompiled 465.tonto in SPEC CPU 2006

2010-12-07 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46506

--- Comment #9 from Changpeng Fang  2010-12-08 
00:54:08 UTC ---
This is not a optimization bug because it fails with -O0.
I am seeking a working src_alt. Thanks.


[Bug rtl-optimization/46829] ICE: in spill_failure, at reload1.c:2105 with -fschedule-insns -fsched-pressure and variadic function

2010-12-09 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829

Changpeng Fang  changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #9 from Changpeng Fang  2010-12-09 
19:16:57 UTC ---
(In reply to comment #5)
> It should work for x86_64, not necessarily i?86.

Do you mean -fsched-pressure should be able to solve the problem completely
for x86-64?

Vladimir: Do you have any idea which direction to go in order to solve this
problem?


[Bug fortran/46842] [4.6 Regression] wrong results with MATHMUL(..., TRANSPOSE (func ())) -- 465.tonto test run miscompares

2010-12-10 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46842

--- Comment #22 from Changpeng Fang  2010-12-10 
22:20:26 UTC ---
(In reply to comment #21)
> Does this fix tonto ?

The patch fixed the 465.tonto test miscompare when applied to
the current gcc 4.6 trunk!

Thanks,


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-14 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

--- Comment #5 from Changpeng Fang  2011-06-14 
22:22:11 UTC ---
It seems there is a prefetch generation bug on Bulldozer.

With -O3 -ffast-math -funroll-loops -fpeel-loops -march=bdver1
-fprefetch-loop-arrays, I got a normal timing of 795s.

However, when "--param min-insn-to-prefetch-ratio=9" is added, the timing
becomes 2853s.

This may be a different bug, in the opposite direction to amdfam10

I also want to mention here that software prefetching was actually enabled
at -O3 and higher for Bulldozer, when Honza cleaned up the code in i386.c
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00573.html


[Bug target/47315] ICE: in extract_insn, at recog.c:2109 (unrecognizable insn) with -mvzeroupper and __attribute__((target("avx")))

2011-06-27 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47315

Changpeng Fang  changed:

   What|Removed |Added

 CC||changpeng.fang at amd dot
   ||com

--- Comment #4 from Changpeng Fang  2011-06-27 
22:54:47 UTC ---
(In reply to comment #2)
> A patch is posted at
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01649.html

H.J., Since this bug shows up in gcc 4.6, could you backport to gcc 4.6
branch? Thanks,

Changpeng