On Thu, 3 May 2007, Orion Poplawski wrote:


Okay, I have a test case for the problem I reported before that I've attached.

We have two pairs of identical machines:

- 2 Tyan S2882 Dual Processor 244 stepping 10
- 2 Tyan S2882-D Dual processor dual core Opteron 275 stepping 2

The attached code when compiled with the Portland Group Fortran compiler with -O2 and run on either of the 244's will abort in random locations:

What about gfortran?  Or pathscale?

Mind you, I made myself actually look at the code below (shudder) in
spite of it being fortran, and it looks ok as far as >>I<< can tell
after not doing fortran unless my life depends on it for twenty years or
so.  To me it is wierd to use a(1) both as the address of a(1) (as an
argument to the subroutines) and as the contents of a(1) = 1, but hey.

It seems really really odd that any compiler or any program would fail
on this piece of code, though.  I wonder if a C memcpy would fail?  Or
what does stream (with a check) do?  Stream's copy isn't much more than
this.

Maybe somebody who has used fortran more recently than the mid-eighties
can comment further on the code, but to me it looks like a very odd
compiler bug.

   rgb


[EMAIL PROTECTED] rams.debug]$ pgf95 -O2 -o testatob testatob.f90
[EMAIL PROTECTED] rams.debug]$ ./testatob
checkatob abort n=       246500 , i=         4685  a(i)=    8712085.
 b(i)=    8465585.
Abort
[EMAIL PROTECTED] rams.debug]$ ./testatob
checkatob abort n=       246500 , i=       145817  a(i)=    9592717.
 b(i)=    8853217.
Abort

[EMAIL PROTECTED] rams.debug]$ time ./testatob
checkatob abort n=       246500 , i=       118169  a(i)=    9565069.
 b(i)=    8825569.
Aborted

real    0m31.842s
user    0m16.476s
sys     0m0.060s


Haven't seen it run longer than 1 minute yet.

However, it runs fine on the 275's (or at least I haven't seen it crash yet). It also runs fine on the 244's when compiled with -O1.

So, I guess this points to a hardware issue, but it may be a somewhat generalized hardware issue. I'd love to hear reports on other (particularly other Tyan S2882 dual 244's) systems.



--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to