Oop, sorry I meant to also include the following: # srun --mpi=list srun: MPI types are... srun: none srun: pmi2 srun: openmpi
running srun with --mpi=openmpi gives the same errors as with MpiDefault=none. ~Avery Grieve They/Them/Theirs please! University of Michigan On Thu, Dec 10, 2020 at 11:34 AM Avery Grieve <agri...@umich.edu> wrote: > Hi Chris, > > Thank you for the offer. Here's some quick information on my system: > > All nodes on Debian 10 (armbian buster converted to DietPi v6.33.3). > sinfo --version: slurm-wlm 18.08.5-2 > > With MpiDefault=pmix I get the following srun errors: > srun: error: Couldn't find the specified plugin name for mpi/pmix looking > at all files > srun: error: cannot find mpi plugin for mpi/pmix > srun: error: cannot create mpi context for mpi/pmix > srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types > > With MpiDefault=none > I get OpenMPI yelling at me and giving me two options, only one relevant > to the version of Slurm I'm running: > version 16.05 or later: you can use SLURM's PMIx support. This > requires that you configure and build SLURM --with-pmix. > > However, as I stated, I'm using the slurm-wlm package which seems to not > include the pmix functionality by default. > > The other option provided: > Versions earlier than 16.05: you must use either SLURM's PMI-1 or > PMI-2 support. SLURM builds PMI-1 by default, or you can manually > install PMI-2. You must then build Open MPI using --with-pmi pointing > to the SLURM PMI library location. > > Similar issue, not building slurm from source doesn't include the PMI > library. I've installed some develop level packages, including the > libpmi2-0 package <https://packages.debian.org/buster/libpmi2-0> which > didn't seem to actually install anything useful as far as I can tell using > the "find" command. > > It's sort of looking like I should be looking at building slurm from > source again, I guess. > > Thanks, > > ~Avery Grieve > They/Them/Theirs please! > University of Michigan > > > On Thu, Dec 10, 2020 at 11:16 AM Christopher J Cawley <ccawl...@gmu.edu> > wrote: > >> I have a 7 node jetson nano cluster running at home. >> >> Send me what you want me to take a look at . If it's not >> a big deal, then I can let you know. >> >> Ubuntu 18 / slurm <some version from rpm> >> >> Thanks >> Chris >> >> >> *Christopher J. Cawley* >> >> *Systems Engineer/Linux Engineer, Information Technology Services* >> >> *223 Aquia Building, Ffx,** MSN**: 1B5* >> >> *George Mason University* >> >> *Phone:** (703) 993-6397* >> >> *Email:* *ccawl...@gmu.edu* >> >> >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Avery Grieve <agri...@umich.edu> >> *Sent:* Thursday, December 10, 2020 10:51 AM >> *To:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> >> *Subject:* [slurm-users] slurm-wlm package OpenMPI PMIx implementation >> >> Hi Forum, >> >> I've been putting together an ARM cluster for fun/learning and I've been >> a bit lost about how to get OpenMPI and slurm to behave together. >> >> I have installed the slurm-wlm package >> <https://secure-web.cisco.com/14EwNb3UZYABzVqRN7IxszUw4L04o_2Bv7wm3a5vivtuqZhDuY3UrhulGE47J31qdoC16rhtMefWeyLXhK10TMim7oOCehTuBJR_47pTBDKcO_xYDX3yqOG1yzamsO31hXo3HS9tSUpOssM40vTLwy4Mxfggu2Qu_yXjJqtLE43mV2CrECvinY7hMt_cRMzi4b8xrKZXqngR31DMmyA9DzimeyLsN7nwxh6kJRMhcg2MjHlCOhu356VVZrErEM9ZafOD66sDUMluigARg1icclZaJOLhXE-7PlFRtAdk2dhXLEvRqSL3SUKrVeBy01MCmSi7sH8bkIijrujncTBU-DfWxY_JOwqhhsJAyXl0XJgjoOiGWHKcLPRRvrCbn_SGHGSw2Ogq3aC4sJLY1tBLwpgvXcOxFoURgb6y6WfJJg04H9ewyQ-Azr7kA_en7DIk_4KOux310uOWzo7XrHTxnLg/https%3A%2F%2Fpackages.debian.org%2Fbuster%2Fslurm-wlm>from >> the Debian apt search and compiled OpenMPI from source on my compute nodes. >> OpenMPI has been compiled with the option --with-slurm and the configure >> time log indicates openmpi has pmix v3 built in. I thought that would be >> enough for slurm and calling a job with "srun -n 4 -N1 executable" (with >> slurm.conf having MpiDefault=pmix_v3) would be enough. >> >> Not the case, unfortunately as slurm doesn't have any idea what pmix_v3 >> means without being compiled against it I guess. I have also attempted to >> compile openmpi from source with the --with-pmi option but the slurm-wlm >> package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h >> etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss >> of what to do here. >> >> A few notes: OpenMPI is working across my compute nodes. I'm able to ssh >> to my compute node and start a job manually with mpirun that executes >> successfully across the nodes. My slurmctld and slurmd daemons work for >> single thread resource allocation (and presumably OpenMP multithreading, >> though I haven't tested this). >> >> Beyond compiling slurm from source (assuming this installs the pmi >> headers that I can use to build openmpi), which I have tried with no luck >> on my devices, is there a way to get slurm and openmpi to behave together >> using the precompiled package slurm-wlm? >> >> Thank you, >> >> ~Avery Grieve >> They/Them/Theirs please! >> University of Michigan >> >