Do you have a firewall running?
On 05/21/2018 11:05 AM, Turner, Heath wrote:
If anyone has advice, I would really appreciate...
I am running (just installed) slurm-11.17.6, with a master + 2 hosts. It works
locally on the master (controller + execution). However, I cannot establish
communication from master [triumph01] with the 2 hosts [triumph02,triumph03].
Here is some more info:
1. munge is running, and munge verification tests all pass.
2. system clocks are in sync on master/hosts.
3. identical slurm.conf files are on master/hosts.
4. configuration of resources (memory/cpus/etc) are correct and have been
confirmed on all machines (all hardware is identical).
5. I have attached:
a) slurm.conf
b) log file from master slurmctld
c) log file from host slurmd
Any ideas about what to try next?
Heath Turner
Professor
Graduate Coordinator
Chemical and Biological Engineering
http://che.eng.ua.edu
University of Alabama
3448 SEC, Box 870203
Tuscaloosa, AL 35487
(205) 348-1733 (phone)
(205) 561-7450 (cell)
(205) 348-7558 (fax)
htur...@eng.ua.edu
http://turnerresearchgroup.ua.edu
--
Andy Riebs
andy.ri...@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!