On Wed, 2008-07-16 at 16:00 -0400, Adam C Powell IV wrote: > On Wed, 2008-07-16 at 13:42 -0400, Adam C Powell IV wrote: > > Package: blacs-mpi > > Version: 1.1-28 > > Severity: wishlist > > > > Greetings, > > > > Please add OpenMPI to the existing LAM and MPICH builds for blacs-mpi. > > As you may know, LAM is deprecated in favor of OpenMPI, so this will be > > a prominent MPI implementation moving forward. > > > > I would be happy to provide a patch if needed. > > Okay, so, I got impatient, and went ahead and made a patch, which is > attached. The "if [ -e ... ]" in the openmpi target is to make sure it > skips those parts on arches which don't have openmpi. > > The one problem is: the openmpi test package depends on liblam4. > Perhaps the MPI=openmpi bit doesn't quite work in the TESTING dir?
Indeed. I've attached an additional patch, which you'd need to apply after the previous patch, to actually make it build using OpenMPI. For explanation: * Bmake.inc needs a new openmpi section, and the fortran programs need -lmpi_f77 as well as -lmpi to link properly * Openmpi needs three mpif*.h files, so the two makefiles need to link all of them to the working directory * In rules, because lam doesn't come first, it requires a clean step before it can build > The packages install fine, and have about the same contents as the -lam > and -mpich packages. I haven't run the tests yet. It builds and installs fine now, and everything has the right dependencies. And the openmpi fortran test runs fine. However, the openmpi C test segfaults early on. :-( Here's the output (orted is similar to lamboot): 252 workhorse% orted 253 workhorse% mpirun -np 4 ./cblacs_test_shared-openmpi BLACS WARNING 'No need to set message ID range due to MPI communicator.' from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'. BLACS WARNING 'No need to set message ID range due to MPI communicator.' from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'. [workhorse:23590] *** Process received signal *** [workhorse:23590] Signal: Segmentation fault (11) [workhorse:23590] Signal code: Address not mapped (1) [workhorse:23590] Failing at address: 0xb08a0cf8 [workhorse:23590] [ 0] /lib/libc.so.6 [0x7f4daf93ff80] [workhorse:23589] *** Process received signal *** [workhorse:23589] Signal: Segmentation fault (11) [workhorse:23589] Signal code: Address not mapped (1) [workhorse:23589] Failing at address: 0x1fb95cf8 BLACS WARNING 'No need to set message ID range due to MPI communicator.' from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'. [workhorse:23589] [ 0] /lib/libc.so.6 [0x7f6f1ec34f80] [workhorse:23589] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_group+0x50) [0x7f6f1f954a20] [workhorse:23589] [ 2] /usr/lib/libblacs-openmpi.so.1(BI_TransUserComm+0x25) [0x7f6f1fbbcc05] [workhorse:23589] [ 3] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridmap+0x132) [0x7f6f1fbae7b2] [workhorse:23589] [ 4] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridinit+0x2ea) [0x7f6f1fbb239a] [workhorse:23589] [ 5] ./cblacs_test_shared-openmpi [0x4036c4] [workhorse:23589] [ 6] ./cblacs_test_shared-openmpi [0x478dcc] [workhorse:23589] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f6f1ec211a6] [workhorse:23589] [ 8] ./cblacs_test_shared-openmpi [0x403559] [workhorse:23589] *** End of error message *** [workhorse:23592] *** Process received signal *** [workhorse:23592] Signal: Segmentation fault (11) [workhorse:23592] Signal code: Address not mapped (1) [workhorse:23592] Failing at address: 0x9a823cf8 [workhorse:23590] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_group+0x50) [0x7f4db065fa20] [workhorse:23590] [ 2] /usr/lib/libblacs-openmpi.so.1(BI_TransUserComm+0x25) [0x7f4db08c7c05] [workhorse:23590] [ 3] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridmap+0x132) [0x7f4db08b97b2] [workhorse:23590] [ 4] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridinit+0x2ea) [0x7f4db08bd39a] BLACS WARNING 'No need to set message ID range due to MPI communicator.' from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'. [workhorse:23591] *** Process received signal *** [workhorse:23591] Signal: Segmentation fault (11) [workhorse:23591] Signal code: Address not mapped (1) [workhorse:23591] Failing at address: 0xb940ccf8 [workhorse:23591] [ 0] /lib/libc.so.6 [0x7f6bb84abf80] [workhorse:23591] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_group+0x50) [0x7f6bb91cba20] [workhorse:23591] [ 2] /usr/lib/libblacs-openmpi.so.1(BI_TransUserComm+0x25) [0x7f6bb9433c05] [workhorse:23591] [ 3] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridmap+0x132) [0x7f6bb94257b2] [workhorse:23591] [ 4] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridinit+0x2ea) [0x7f6bb942939a] [workhorse:23591] [ 5] ./cblacs_test_shared-openmpi [0x4036c4] [workhorse:23591] [ 6] ./cblacs_test_shared-openmpi [0x478dcc] [workhorse:23591] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f6bb84981a6] [workhorse:23591] [ 8] ./cblacs_test_shared-openmpi [0x403559] [workhorse:23591] *** End of error message *** [workhorse:23592] [ 0] /lib/libc.so.6 [0x7fe0998c2f80] [workhorse:23592] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_group+0x50) [0x7fe09a5e2a20] [workhorse:23592] [ 2] /usr/lib/libblacs-openmpi.so.1(BI_TransUserComm+0x25) [0x7fe09a84ac05] [workhorse:23592] [ 3] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridmap+0x132) [0x7fe09a83c7b2] [workhorse:23592] [ 4] /usr/lib/libblacs-openmpi.so.1(Cblacs_gridinit+0x2ea) [0x7fe09a84039a] [workhorse:23592] [ 5] ./cblacs_test_shared-openmpi [0x4036c4] [workhorse:23592] [ 6] ./cblacs_test_shared-openmpi [0x478dcc] [workhorse:23592] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) [0x7fe0998af1a6] [workhorse:23592] [ 8] ./cblacs_test_shared-openmpi [0x403559] [workhorse:23592] *** End of error message *** [workhorse:23586] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [workhorse:23586] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [workhorse:23586] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 mpirun noticed that job rank 0 with PID 23589 on node workhorse exited on signal 11 (Segmentation fault). 1 additional process aborted (not shown) [workhorse:23586] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [workhorse:23586] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198 -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS. -------------------------------------------------------------------------- Do you want to send this upstream, or shall I? Cheers, -Adam -- GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6 Engineering consulting with open source tools http://www.opennovation.com/
--- blacs-mpi-1.1/Bmake.inc~ 2008-08-13 22:23:52.000000000 +0000 +++ blacs-mpi-1.1/Bmake.inc 2008-08-13 22:42:47.000000000 +0000 @@ -55,12 +55,20 @@ MPILIBdir = $(MPIdir)/lib MPIINCdir = $(MPIdir)/include MPILIB = $(MPILIBdir)/shared/libmpich.so $(MPILIBdir)/shared/libpmpich.so $(MPILIBdir)/libmpich.a -else +endif +ifeq ($(MPI),lam) # for compilation with lam: MPILIBdir = /usr/lib/lam/lib MPIINCdir = /usr/include/lam MPILIB = -L/usr/lib/lam/lib -llam endif +ifeq ($(MPI),openmpi) +# for compilation with openmpi: + MPIdir = /usr/lib/openmpi + MPILIBdir = $(MPIdir)/lib + MPIINCdir = $(MPIdir)/include + MPILIB = -L/usr/lib/openmpi/lib -lmpi -lmpi_f77 +endif # ------------------------------------- --- blacs-mpi-1.1/SRC/MPI/Makefile~ 2008-08-13 22:23:52.000000000 +0000 +++ blacs-mpi-1.1/SRC/MPI/Makefile 2008-08-13 22:55:32.000000000 +0000 @@ -194,8 +194,8 @@ $(F77) -c $(F77FLAGS) $*.f mpif.h: $(MPIINCdir)/mpif.h - rm -f mpif.h - ln -s $< $@ + rm -f mpif* + ln -s $(MPIINCdir)/mpif* . # ------------------------------------------------------------------------ # We move C .o files to .C so that we can use the portable suffix rule for --- blacs-mpi-1.1/TESTING/Makefile~ 2008-08-13 22:23:52.000000000 +0000 +++ blacs-mpi-1.1/TESTING/Makefile 2008-08-13 23:01:46.000000000 +0000 @@ -59,8 +59,8 @@ $(F77) -c $(F77FLAGS) $*.f mpif.h: $(MPIINCdir)/mpif.h - rm -f mpif.h - ln -s $< $@ + rm -f mpif* + ln -s $(MPIINCdir)/mpif* . fpvm3.h : $(PVMINCdir)/fpvm3.h rm -f fpvm3.h --- blacs-mpi-1.1/debian/rules~ 2008-08-13 23:15:42.000000000 +0000 +++ blacs-mpi-1.1/debian/rules 2008-08-13 23:16:19.000000000 +0000 @@ -56,6 +56,9 @@ build-stamp-lam: dh_testdir [ -d TESTING/EXE ] || mkdir TESTING/EXE +# next is a clean + BASEDIR=$(topdir) make cleanall + cd TESTING && BASEDIR=$(topdir) make clean # build the static libraries BASEDIR=$(topdir) MPI=lam make mpi # the testing binaries
signature.asc
Description: This is a digitally signed message part