Dear All I have a problem with running HPL benchmark on large problem sizes, namely - the total memory on our cluster (15x (2xOpteron 275) nodes - 60 cores, 1Gb memory per core) is 60 Gb. This should allow for problem sizes of up to 80,000, based on 80% of available memory. HPL, both compiled standalone and as a part of HPCC crashes when the problem size is such that the memory required exceeds 8Gb (which is by ?coincidence? the total memory available on the master (submit) node). The error messages are
running /home/shel/Benchmarks/RUN/./xhpl on 60 LINUX ch_p4 processors Created /home/shel/Benchmarks/RUN/PI11051 p20_28603: (1845.558594) net_send: could not write to fd=15, errno = 110 p12_29208: p4_error: interrupt SIGx: 13 p40_26441: p4_error: interrupt SIGx: 13 p48_28494: p4_error: interrupt SIGx: 13 P4 procgroup file is /home/shel/Benchmarks/RUN/PI11051. p20_28603: p4_error: net_send write: -1 p23_28690: (1842.156250) net_send: could not write to fd=16, errno = 104 p4_error: latest msg from perror: Connection timed out It seems that it is not an actual memory problem, since HPL crashes on size of 40000, for example, where each process uses approx 345Mb. P4_GLOBMEMSIZE is set to 134217728 Thanks for your help! Evgeniy Configuration is as follows: mpich 1.2.7p1 - compiled with gcc 3.4.5 hpl 1.0a - compiled with gcc 3.4.5 ATLAS - compiled with gcc 3.4.5 HPL.dat: HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 8 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 40000 Ns 1 # of NBs 60 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 10 Ps 6 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 8 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 40000 Ns 1 # of NBs 60 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 10 Ps 6 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) HPL.out: ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 40000 NB : 60 PMAP : Row-major process mapping P : 10 Q : 6 PFACT : Left Crout Right NBMIN : 2 4 NDIV : 2 RFACT : Left Crout Right BCAST : 1ring DEPTH : 0 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2L2 40000 60 10 6 456.11 9.355e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0137911 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0126099 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0024762 ...... PASSED ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2L4 40000 60 10 6 451.57 9.449e+01 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf