Okay, I have a test case for the problem I reported before that I've attached.

We have two pairs of identical machines:

- 2 Tyan S2882 Dual Processor 244 stepping 10
- 2 Tyan S2882-D Dual processor dual core Opteron 275 stepping 2

The attached code when compiled with the Portland Group Fortran compiler with -O2 and run on either of the 244's will abort in random locations:

[EMAIL PROTECTED] rams.debug]$ pgf95 -O2 -o testatob testatob.f90
[EMAIL PROTECTED] rams.debug]$ ./testatob
 checkatob abort n=       246500 , i=         4685  a(i)=    8712085.
  b(i)=    8465585.
Abort
[EMAIL PROTECTED] rams.debug]$ ./testatob
 checkatob abort n=       246500 , i=       145817  a(i)=    9592717.
  b(i)=    8853217.
Abort

[EMAIL PROTECTED] rams.debug]$ time ./testatob
 checkatob abort n=       246500 , i=       118169  a(i)=    9565069.
  b(i)=    8825569.
Aborted

real    0m31.842s
user    0m16.476s
sys     0m0.060s


Haven't seen it run longer than 1 minute yet.

However, it runs fine on the 275's (or at least I haven't seen it crash yet). It also runs fine on the 244's when compiled with -O1.

So, I guess this points to a hardware issue, but it may be a somewhat generalized hardware issue. I'd love to hear reports on other (particularly other Tyan S2882 dual 244's) systems.

--
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  [EMAIL PROTECTED]
Boulder, CO 80301              http://www.cora.nwra.com
!
!
SUBROUTINE ATOB(N,A,B)
DIMENSION A(*),B(*)
DO 100 I=1,N
B(I)=A(I)
100 CONTINUE
RETURN
END
!
!     ******************************************************************
!
!
SUBROUTINE CHECKATOB(N,A,B)
DIMENSION A(*),B(*)
DO 100 I=1,N
IF (B(I).NE.A(I)) then
 write(6,*),'checkatob abort n=',n,', i=',i,' a(i)=',a(i),' b(i)=',b(i)
 call abort
end if
100 CONTINUE
RETURN
END
!
!     ******************************************************************
program test
    real, allocatable :: a(:)
    allocate(a(11665401))
    do i=1,11665401
       a(i)=i
    end do
    do
       call atob(246500,a(8460901),a(1))
       call checkatob(246500,a(8460901),a(1))
       call atob(246500,a(8707401),a(1))
       call checkatob(246500,a(8707401),a(1))
       call atob(246500,a(9446901),a(1))
       call checkatob(246500,a(9446901),a(1))
    end do
end
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to