Re: [slurm-users] slurm_persist_conn_open_without_init: failed to open persistent connection to host

2022-11-30 Thread William Brown
If this is a single host machine I suggest checking the /etc/hosts file to make sure that ‘mannose’ is listed as you expect. It is generally advised to use FQDNs for host names; the fact that the message “connection to host:mannose:6819: Connection refused” used a short name may mean that in a

[slurm-users] slurm_persist_conn_open_without_init: failed to open persistent connection to host

2022-11-30 Thread Sushil Mishra
Hi all, I installed slurm and enable accounting in a single-node machine, i.e same server is the master and computing node. I mainly followed this page for instructions: https://southgreenplatform.github.io/trainings/hpc/slurminstallation/ After enabling accounting I am having problems in starting

[slurm-users] run issue

2022-11-30 Thread Feng Zhang
hello all, I am doing some tests using the Slurm. Just found that when I run the srun command with -n and -c options, when the -n and -c are odd numbers, srun job hangs and no shell is given to me. When I check using "squeue", it reports that this job is actually running. When -C = even number

Re: [slurm-users] Licenses: Remote vs Reservation

2022-11-30 Thread Brian Andrus
Richard, If you don't have a large cluster, doing the local license method is actually feasible. The biggest issue is the efforts to ensure all the nodes have the same slurm.conf and then the traffic when they are all re-queried to report their status on a reconfigure. That said, I have had

[slurm-users] Licenses: Remote vs Reservation

2022-11-30 Thread Richard Ems
Hi all, I have to change our set up to be able to update the total number of available licenses due to users checking out licenses interactively. We now use Local Licenses, and could just regularly update slurm.conf and reconfigure, but I don't think that is the best solution. I see there are at

Re: [slurm-users] ABI Stability

2022-11-30 Thread Ward Poelmans
Hi Michael, On 30/11/2022 07:29, Michael Milton wrote: Considering this, my question is about which APIs (ABI, CLI, other?) are considered stable and worth targeting from a third party application. In addition, is there any initiative to making the ABI stable, because it seems like it would

Re: [slurm-users] salloc problem

2022-11-30 Thread Gizo Nanava
Sorry for this very late response. The directory where job containers are to be created is of course already there - it is the local filesystem. We also start slurmd as a very last process once a node is ready to accept jobs. That seems to be either a feature of salloc or a bug in Slurm, presumab