Hi Phil,

From a distance, it feels like there may be a mismatch in Slurm versions (an auxiliary build hiding out somewhere?). You might try something like

$ which srun; srun which srun

Just to confirm that both the submit and execute nodes are running the same slurm instance.

Andy

On 12/7/2020 9:19 AM, Yuengling, Philip J. wrote:

Thanks Andy,

Slurm was compiled with --with-pmix=/share/local/pmix-3.2.1. The build of pmix isinstalled under /share/local/pmix-3.2.1 which is an NFS share across all the nodes.  I should also note I used devtoolset-10 (gcc 10) on RHEL7 and confirmed that everything was compiled with that version of compiler.

I also set LD_LIBRARY_PATH to include /share/local/pmix-3.2.1

Cheers!

Phil

*From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Andy Riebs <a...@candooz.com> *Reply-To: *"a...@candooz.com" <a...@candooz.com>, Slurm User Community List <slurm-users@lists.schedmd.com>
*Date: *Friday, December 4, 2020 at 3:07 PM
*To: *"slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
*Subject: *[EXT] Re: [slurm-users] pmix issue

*APL external email warning: *Verify sender slurm-users-boun...@lists.schedmd.com before clicking links or attachments

Also, Slurm was built with "/fs/local/pmix-3.2.1" -- does that translate well to "/share/local/pmix-3.2.1"?

Andy

On 12/4/2020 2:59 PM, Andy Riebs wrote:

    Are you sure that /share/local/pmix-3.2.1 exists on the compute nodes?

    On 12/4/2020 2:54 PM, Yuengling, Philip J. wrote:

        Hi everyone,

        I’ve been having difficulty getting the --mpi=pmix_v3 option
        to work for me.  I can get --mpi=pmi2 to work ok, but I really
        want to understand what I’m doing wrong here.  Everything
        seems to build ok.

        $ srun --mpi=list

        srun: MPI types are...

        srun: pmix

        srun: pmix_v3

        srun: cray_shasta

        srun: none

        srun: pmi2

        $ srun --mpi=pmix_v3 -N5 date

        srun: error: task 1 launch failed: Invalid MPI plugin name

        srun: error: task 2 launch failed: Invalid MPI plugin name

        srun: error: task 3 launch failed: Invalid MPI plugin name

        srun: error: task 4 launch failed: Invalid MPI plugin name

        srun: error: task 0 launch failed: Invalid MPI plugin name

        $ srun --mpi=pmi2 -N5 date

        Fri Dec  4 13:52:39 EST 2020

        Fri Dec  4 13:52:39 EST 2020

        Fri Dec  4 13:52:39 EST 2020

        Fri Dec  4 13:52:39 EST 2020

        Fri Dec  4 13:52:39 EST 2020

        openpmix:

        CC=/opt/rh/devtoolset-10/root/usr/bin/gcc ./configure
        --prefix=/share/local/pmix-3.2.1
        --with-hwloc=/share/local/hwloc-2.4.0

        Slurm 20.11.0:

        rpmbuild --define "_with_pmix
        --with-pmix=/fs/local/pmix-3.2.1" -ta slurm-20.11.0.tar.bz2

        From config.log:

        ./configure --build=x86_64-redhat-linux-gnu
        --host=x86_64-redhat-linux-gnu --program-prefix=
        --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
        --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm
        --datadir=/usr/share --includedir=/usr/include
        --libdir=/usr/lib64 --libexecdir=/usr/libexec
        --localstatedir=/var --sharedstatedir=/var/lib
        --mandir=/usr/share/man --infodir=/usr/share/info
        --with-pmix=/fs/local/pmix-3.2.1 --disable-slurmrestd

        Open MP 4.0.5:

        ./configure  '--prefix=/share/openmpi-4.0.5' '--with-cuda'
        '--with-pmix=/share/local/pmix-3.2.1' '--with-pmi=/usr'
        '--with-slurm' '--without-ucx' '--without-verbs'

--
        Philip J. Yuengling

        Johns Hopkins University

-->

Reply via email to