Thanks for the follow-up Phil, both the problem and the way that you
tracked it down. Another good note for the Slurm Users' Toolkit!
Andy
On 12/7/2020 9:02 PM, Yuengling, Philip J. wrote:
Thanks everyone for your replies!
It turned out to be a library dependency for libevent wasn’t being
found as needed. While I thought I was using a shared-location
library, I was not. I had apparently set up /etc/ld.so.conf.d on the
build host to use /usr/local/lib which… has libevent in it. But none
of the other nodes had this. The problem became apparent after going
to each node and running pmix_info.
This means I should remove the ld.so.conf.d entry and rebuild
everything against the preferred set of libraries. Otherwise pmix
appears to work as expected now.
Cheers!
Phil
*From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf
of Philip Kovacs <pkde...@yahoo.com>
*Reply-To: *Philip Kovacs <pkde...@yahoo.com>, Slurm User Community
List <slurm-users@lists.schedmd.com>
*Date: *Monday, December 7, 2020 at 10:55 AM
*To: *"a...@candooz.com" <a...@candooz.com>, Slurm User Community List
<slurm-users@lists.schedmd.com>
*Subject: *Re: [slurm-users] [EXT] Re: pmix issue
*APL external email warning: *Verify sender
slurm-users-boun...@lists.schedmd.com before clicking links or attachments
Make sure the .so symlink for the pmix lib is available -- not just
the versioned .so, e.g. .so.2. Slurm requires that .so symlink. Some
distros split packages into base/devel, so you may need to install a
pmix-devel package, if available, in order to add the .so symlink
(which is considered a "development" file).
On Monday, December 7, 2020, 09:22:06 AM EST, Yuengling, Philip J.
<philip.yuengl...@jhuapl.edu> wrote:
Thanks Andy,
Slurm was compiled with --with-pmix=/share/local/pmix-3.2.1. The
build of pmix is installed under /share/local/pmix-3.2.1 which is an
NFS share across all the nodes. I should also note I used
devtoolset-10 (gcc 10) on RHEL7 and confirmed that everything was
compiled with that version of compiler.
I also set LD_LIBRARY_PATH to include /share/local/pmix-3.2.1
Cheers!
Phil
*From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf
of Andy Riebs <a...@candooz.com>
*Reply-To: *"a...@candooz.com" <a...@candooz.com>, Slurm User
Community List <slurm-users@lists.schedmd.com>
*Date: *Friday, December 4, 2020 at 3:07 PM
*To: *"slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
*Subject: *[EXT] Re: [slurm-users] pmix issue
*APL external email warning: *Verify sender
slurm-users-boun...@lists.schedmd.com before clicking links or attachments
Also, Slurm was built with "/fs/local/pmix-3.2.1" -- does that
translate well to "/share/local/pmix-3.2.1"?
Andy
On 12/4/2020 2:59 PM, Andy Riebs wrote:
Are you sure that /share/local/pmix-3.2.1 exists on the compute nodes?
On 12/4/2020 2:54 PM, Yuengling, Philip J. wrote:
Hi everyone,
I’ve been having difficulty getting the --mpi=pmix_v3 option
to work for me. I can get --mpi=pmi2 to work ok, but I really
want to understand what I’m doing wrong here. Everything
seems to build ok.
$ srun --mpi=list
srun: MPI types are...
srun: pmix
srun: pmix_v3
srun: cray_shasta
srun: none
srun: pmi2
$ srun --mpi=pmix_v3 -N5 date
srun: error: task 1 launch failed: Invalid MPI plugin name
srun: error: task 2 launch failed: Invalid MPI plugin name
srun: error: task 3 launch failed: Invalid MPI plugin name
srun: error: task 4 launch failed: Invalid MPI plugin name
srun: error: task 0 launch failed: Invalid MPI plugin name
$ srun --mpi=pmi2 -N5 date
Fri Dec 4 13:52:39 EST 2020
Fri Dec 4 13:52:39 EST 2020
Fri Dec 4 13:52:39 EST 2020
Fri Dec 4 13:52:39 EST 2020
Fri Dec 4 13:52:39 EST 2020
openpmix:
CC=/opt/rh/devtoolset-10/root/usr/bin/gcc ./configure
--prefix=/share/local/pmix-3.2.1
--with-hwloc=/share/local/hwloc-2.4.0
Slurm 20.11.0:
rpmbuild --define "_with_pmix
--with-pmix=/fs/local/pmix-3.2.1" -ta slurm-20.11.0.tar.bz2
From config.log:
./configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm
--datadir=/usr/share --includedir=/usr/include
--libdir=/usr/lib64 --libexecdir=/usr/libexec
--localstatedir=/var --sharedstatedir=/var/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-pmix=/fs/local/pmix-3.2.1 --disable-slurmrestd
Open MP 4.0.5:
./configure '--prefix=/share/openmpi-4.0.5' '--with-cuda'
'--with-pmix=/share/local/pmix-3.2.1' '--with-pmi=/usr'
'--with-slurm' '--without-ucx' '--without-verbs'
--
Philip J. Yuengling
Johns Hopkins University
-->