Hi Brian,
try:
export SLURM_OVERLAP=1
export SLURM_WHOLE=1
before your salloc and see if that helps. I have seen some mpi issues
that were resolved with that.
Unfortunately no dice:
andrej@terra:~$ export SLURM_OVERLAP=1
andrej@terra:~$ export SLURM_WHOLE=1
andrej@terra:~$ salloc -N2 -n2
s
try:
export SLURM_OVERLAP=1
export SLURM_WHOLE=1
before your salloc and see if that helps. I have seen some mpi issues
that were resolved with that.
You can also try it using just the regular mpirun on the nodes
allocated. That will help with a datapoint as well.
Brian Andrus
On 2/4/2021
Hi Brian,
Thanks for your response!
Did you compile slurm with mpi support?
Yep:
andrej@terra:~$ srun --mpi=list
srun: MPI types are...
srun: cray_shasta
srun: none
srun: pmi2
srun: pmix
srun: pmix_v4
Your mpi libraries should be the same as that version and they should
be available in th
Did you compile slurm with mpi support?
Your mpi libraries should be the same as that version and they should be
available in the same locations for all nodes.
Also, ensure they are accessible (PATH, LD_LIBRARY_PATH, etc are set)
Brian Andrus
On 2/4/2021 1:20 PM, Andrej Prsa wrote:
Gentle bum
Gentle bump on this, if anyone has suggestions as I weed through the
scattered slurm docs. :)
Thanks,
Andrej
On February 2, 2021 00:14:37 Andrej Prsa wrote:
Dear list,
I'm struggling with what seems to be very similar to this thread:
https://lists.schedmd.com/pipermail/slurm-users/2019-Jul
Dear list,
I'm struggling with what seems to be very similar to this thread:
https://lists.schedmd.com/pipermail/slurm-users/2019-July/003746.html
I'm using slurm 20.11.3 patched with this fix to detect pmixv4:
https://bugs.schedmd.com/show_bug.cgi?id=10683
and this is what I'm seeing:
a