Luke, Thanks again, I'll have a look at your links here. I think I'm going to have to go and compile slurm from source. Hopefully it goes better than it did last time!
I'm just trying to get myself a functional test cluster, really. Technically I already have that by logging into my compute node and running things with mpirun and a hostfile defined, but having a scheduler is a good learning experience and makes useability a lot nicer! Again, I appreciate the help, hopefully I can wrastle this into working. ~Avery Grieve They/Them/Theirs please! University of Michigan On Thu, Dec 10, 2020 at 1:30 PM Luke Yeager <lyea...@nvidia.com> wrote: > The ubuntu package is here: https://packages.ubuntu.com/focal/libpmix-dev > > > > Yes, we rewrote the service files (see here > <https://github.com/NVIDIA/nephele-packages/blob/master/slurm/debian/PACKAGE-control.slurmctld.service>) > and we let debhelper install them to the appropriate location. > > > > > > It seems like you’re wanting to simply get a development build going > rather than building packages for distribution. Nonetheless, reading > through the packaging files here might help because it shows how to build > recent slurm on recent ubuntu/Debian: > https://github.com/NVIDIA/nephele-packages/tree/master/slurm > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *Avery Grieve > *Sent:* Thursday, December 10, 2020 10:18 AM > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] slurm-wlm package OpenMPI PMIx implementation > > > > *External email: Use caution opening links or attachments* > > > > Hey Luke, > > > > Thanks for the response. I should have mentioned I'm on debian. What's the > name of the ubuntu package for pmix? I'll see if I can track down the > debian equivalent. > > > > When you build slurm from scratch you have to place the .service files > into /etc/init.d and the daemon files in /etc/systemd/system, right? When I > tried building from source it didn't do that for me (even as root). Not > sure if intended or if I was missing something. > > > > Thanks > > -ave > > > > On Thu, Dec 10, 2020, 1:11 PM Luke Yeager <lyea...@nvidia.com> wrote: > > Hi Avery, > > > > - pmix: we just use the standard Ubuntu packages on 20.04. > Unfortunately the standard packages on 18.04 are too out of date for us. > - openmpi: we build our own, using ./configure --with-pmix=internal … > - slurm: we build our own, using ./configure --with-pmix=PATH … (see > here > > <https://github.com/NVIDIA/nephele-packages/blob/42145aef4bbe2cff335a1fca222766232dab7aa7/slurm/debian/rules#L41> > ) > > > > Then we can set MpiDefault=pmix (see here > <https://github.com/NVIDIA/nephele/blob/1d79977164d5ef1418466bfb322d59d502c18e8f/ansible/roles/slurm/templates/etc/slurm/slurm.conf.default#L87>) > and it works. > > > > $ srun --mpi=list > > srun: MPI types are... > > srun: cray_shasta > > srun: pmi2 > > srun: pmix_v3 > > srun: pmix > > srun: none > > > > Hope that helps, > > Luke > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *Avery Grieve > *Sent:* Thursday, December 10, 2020 7:52 AM > *To:* slurm-users@lists.schedmd.com > *Subject:* [slurm-users] slurm-wlm package OpenMPI PMIx implementation > > > > *External email: Use caution opening links or attachments* > > > > Hi Forum, > > > > I've been putting together an ARM cluster for fun/learning and I've been a > bit lost about how to get OpenMPI and slurm to behave together. > > > > I have installed the slurm-wlm package > <https://packages.debian.org/buster/slurm-wlm>from the Debian apt search > and compiled OpenMPI from source on my compute nodes. OpenMPI has been > compiled with the option --with-slurm and the configure time log indicates > openmpi has pmix v3 built in. I thought that would be enough for slurm and > calling a job with "srun -n 4 -N1 executable" (with slurm.conf having > MpiDefault=pmix_v3) would be enough. > > > > Not the case, unfortunately as slurm doesn't have any idea what pmix_v3 > means without being compiled against it I guess. I have also attempted to > compile openmpi from source with the --with-pmi option but the slurm-wlm > package doesn't install any of the libraries/headers (pmi.h pmi2.h pmix.h > etc). Neither does any of the slurm-llnl develop packages, so I'm at a loss > of what to do here. > > > > A few notes: OpenMPI is working across my compute nodes. I'm able to ssh > to my compute node and start a job manually with mpirun that executes > successfully across the nodes. My slurmctld and slurmd daemons work for > single thread resource allocation (and presumably OpenMP multithreading, > though I haven't tested this). > > > > Beyond compiling slurm from source (assuming this installs the pmi headers > that I can use to build openmpi), which I have tried with no luck on my > devices, is there a way to get slurm and openmpi to behave together using > the precompiled package slurm-wlm? > > > > Thank you, > > > > ~Avery Grieve > > They/Them/Theirs please! > > University of Michigan > >