I am  trouble using or running a python-mpi program involving more than one
node. The pythom-mpi program is very simple,

do you think there's something unique about the python program?
(also, you mean mpi4py, right?)

Since authentication with Slurm is used via munge, do I need a passwordless
SSH communication between the slurmctl and the nodes? (I found a

no, you don't need it.  the combination of slurmd (the actual spawning)
and munge (for credentials/authentication) is how slurmctld starts jobs.

guide,probably outdated stating that passwordless SSH communication is a
neccessity for slurm,
HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).

I suspect that's an editing escape: you do usually want mutual access among
user-accessible nodes (login, compute, but not usually admin things like slurmctld or slurmdb nodes).

?srun -N2 -n8 python3 python-mpi.py? ,

using srun does not depend on ssh.  if you use mpirun/mpiexec, it *might*
depend on ssh (but only among the compute nodes).

It works fine running on a single node(with ?-N1? instead of ?-N2?), but it is 
aborted or stopped when running on two nodes.

I would guess you need to look at slurmd logs on the nodes.

regards, mark hahn.

Reply via email to