Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Christopher Samuel
On 12/1/21 5:51 am, Gestió Servidors wrote: I can’t syncronize before with “ntpdate” because when I run “ntpdate -s my_NTP_server”, I only received message “ntpdate: no server suitable for synchronization found”… Yeah, you'll need to make sure your NTP infrastructure is working first. There

Re: [slurm-users] random allocation of resources

2021-12-01 Thread Christopher Samuel
On 12/1/21 3:27 pm, Brian Andrus wrote: If you truly want something like this, you could have a wrapper script look at available nodes, pick a random one and set the job to use that node. Alternatively you could have a cron job that adjusted nodes `weight` periodically to change which ones S

Re: [slurm-users] random allocation of resources

2021-12-01 Thread Brian Andrus
That would make sense, as slurm would not be aware of anything else. Slurmd does not report any ongoing status of resources. It is slurmctld that keeps track of what it has allocated. If you truly want something like this, you could have a wrapper script look at available nodes, pick a random

Re: [slurm-users] random allocation of resources

2021-12-01 Thread Benjamin Nacar
Based on some quick experiments, that doesn't do what I'm looking for. I set LLN=YES for the default partition and ran my test job several times, waiting each time for it to finish before submitting it again (so that all compute nodes were idle), and it still ended up on the same (first in the

Re: [slurm-users] random allocation of resources

2021-12-01 Thread mercan
Hi; The Slurm is selecting the nodes according to the weight parameter of the nodes. I don't know any settings to change the way of the selecting node, except the changing values of the weights. But it is not a suitable for the randomly selecting nodes. Fortunately, absolutely there is not a

Re: [slurm-users] random allocation of resources

2021-12-01 Thread Guillaume COCHARD
Hello, I think you are looking for the LLN option (Least Loaded Nodes): https://slurm.schedmd.com/slurm.conf.html#OPT_LLN Guillaume - Mail original - De: "Benjamin Nacar" À: slurm-users@lists.schedmd.com Envoyé: Mercredi 1 Décembre 2021 20:07:23 Objet: [slurm-users] random allocation o

[slurm-users] random allocation of resources

2021-12-01 Thread Benjamin Nacar
Hi, Is there a scheduling option such that, when there are multiple nodes that are equivalent in terms of available and allocated resources, Slurm would select randomly from among those nodes? I've noticed that if no other jobs are running, and I submit a single job via srun, with no paramet

Re: [slurm-users] Preferential scheduling on a subset of nodes

2021-12-01 Thread Paul Edmon
If you set up a higher priority partition with Preemption OFF on the lower priority partition you should be able to accomplish this.  If you have preemption turned off for the specific partitions in question Slurm will not preempt but will schedule jobs from the higher priority partition first

[slurm-users] Preferential scheduling on a subset of nodes

2021-12-01 Thread Sean McGrath
Hi, Apologies for having to ask such a basic question. We want to be able to give some users preferential access to some nodes. They bought the nodes which are currently in a 'long' partition as their jobs need a longer walltime. When the purchasing users group is not using the nodes I would lik

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Gestió Servidors
Hi, I can't syncronize before with "ntpdate" because when I run "ntpdate -s my_NTP_server", I only received message "ntpdate: no server suitable for synchronization found"... Thanks.-- [cid:image001.jpg@01D7E6C2.E78DE900] Daniel Ruiz Molina Tècnic Mitjà Informàtic Arquitec

Re: [slurm-users] nvml autodetect is ignoring gpus

2021-12-01 Thread Benjamin Nacar
Confirmed that adding just the "Gres=" bit in slurm.conf works. That's what I get for reading the documentation too fast... thanks all! ~~ bnacar On Wed, 1 Dec 2021 14:05:09 +0100 Quirin Lohr wrote: > Hi, > > you still need to specify the gpus in the node definition in slurm.conf. > At least

Re: [slurm-users] nvml autodetect is ignoring gpus

2021-12-01 Thread Quirin Lohr
Hi, you still need to specify the gpus in the node definition in slurm.conf. At least the number, perhaps even the type reported by nvml must match the node definition. (Gres=gpu:geforce_gtx_1080:4) I think the error message can be ignored, the 1080 just does not support this feature. Am

Re: [slurm-users] WTERMSIG 15

2021-12-01 Thread Yair Yarom
I guess they won't be killed, but having them there could cause other issues. I.e. any limit that systemd places on the slurmd service will be applied to the jobs as well, and probably cumulatively. Do you use cgroup for the slurm resource management (the TaskPlugin)? If so it means this is not wor

Re: [slurm-users] nvml autodetect is ignoring gpus

2021-12-01 Thread Fernando Guillén Camba
I also compiled Slurm 20.11.8 to have GPU support in AlmaLinux 8.4 but don't have any problem with  NVML detecting our A100s. ¿Maybe the NVML library version used for Slurm compilation has to match the library version of the compute node where the GPU is? Also, I see that you're using Geforce

Re: [slurm-users] slurmstepd: error: Too many levels of symbolic links

2021-12-01 Thread Bjørn-Helge Mevik
Adrian Sevcenco writes: > Hi! Does anyone know what could the the cause of such error? > I have a shared home, slurm 20.11.8 and i try a simple script in the submit > directory > which is in the home that is nfs shared... We had the "Too many levels of symbolic links" error some years ago, whil