Re: [slurm-users] Slurm fair share priority not being applied

2017-11-30 Thread Loris Bennett
Hi Bruno, Bruno Santos writes: > Hi everyone, > > I have recently set-up slurm to use priority/multifactor giving fair share > the major weight. > > Since the queue is up I have submitted about 1500 small jobs and just today 2 > other users jumped on the queue with their first job. However, i

Re: [slurm-users] Slurm fair share priority not being applied

2017-11-30 Thread Chris Samuel
On Friday, 1 December 2017 4:21:43 AM AEDT Brian W. Johanson wrote: > Almost there, add in PriorityFlags=FAIR_TREE +1 on fair tree. To see the state of your fairshare config run: sshare -l -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm fair share priority not being applied

2017-11-30 Thread Bruno Santos
I added it as suggested and restarted both the controller and the mode daemon but no change. Priority is still the same On 30 Nov 2017 17:24, "Brian W. Johanson" wrote: > Almost there, add in PriorityFlags=FAIR_TREE > > If you missed it, check out https://slurm.schedmd.com/fair_tree.html > -b >

Re: [slurm-users] '--x11' or no '--x11' when using srun when both methods work for X11 graphical applications

2017-11-30 Thread Jeffrey Frey
> Also FWIW, in setting-up the 17.11 on CentOS 7, I encountered these minor > gotchas: > > - Your head/login node's sshd MUST be configured with "X11UseLocalhost no" so > the X11 TCP port isn't bound to the loopback interface alone Anyone who read this, please ignore. Red herring thanks to slu

Re: [slurm-users] Strange problem with Slurm 17.11.0: "batch job complete failure"

2017-11-30 Thread Matthieu Hautreux
Hi, You should look at that bug : https://bugs.schedmd.com/show_bug.cgi?id=4412 I thought it would be resolved in 17.11.0. Regards Matthieu Le 30 nov. 2017 00:56, "Andy Riebs" a écrit : > We've just installed 17.11.0 on our 100+ node x86_64 cluster running > CentOS 7.4 this afternoon, and per

Re: [slurm-users] Slurm fair share priority not being applied

2017-11-30 Thread Brian W. Johanson
Almost there, add in PriorityFlags=FAIR_TREE If you missed it, check out https://slurm.schedmd.com/fair_tree.html -b On 11/30/2017 12:10 PM, Bruno Santos wrote: Hi everyone, I have recently set-up slurm to use priority/multifactor giving fair share the major weight. Since the queue is up I

[slurm-users] Slurm fair share priority not being applied

2017-11-30 Thread Bruno Santos
Hi everyone, I have recently set-up slurm to use priority/multifactor giving fair share the major weight. Since the queue is up I have submitted about 1500 small jobs and just today 2 other users jumped on the queue with their first job. However, it seems that slurm is not taking into account the

Re: [slurm-users] '--x11' or no '--x11' when using srun when both methods work for X11 graphical applications

2017-11-30 Thread Jeffrey Frey
FWIW, though the "--x11" flag is available to srun in 17.11.0, neither the man page nor the built-in --help mention its presence or how to use it. Also FWIW, in setting-up the 17.11 on CentOS 7, I encountered these minor gotchas: - Your head/login node's sshd MUST be configured with "X11UseLoc

Re: [slurm-users] slurm conf with single machine with multi cores.

2017-11-30 Thread Le Biot, Pierre-Marie
Hello David, slurmd daemon is not running (while slurmctld and slurmdbd are). slurmd.log (different from slurmctld.log) should contain more information. Regards, Pierre-Marie Le Biot From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of david vilanova Sent: Thursday, No

Re: [slurm-users] jobs stuck in ReqNodeNotAvail,

2017-11-30 Thread Christian Anthon
I now realised I probably need some kind of job preemption to make things work the way I want them to. I'll take a look at how slurm does that. Cheers, Christian. On 30-11-2017 13:29, Chris Samuel wrote: On Thursday, 30 November 2017 9:40:53 PM AEDT Christian Anthon wrote: The queue has a t

Re: [slurm-users] Missing systemd unit files in SLURM 17.11.0 RPMs

2017-11-30 Thread Alan Orth
Dear Ole, You are absolutely right! Thank you for pointing this out. I hadn't noticed the RPMs were re-arranged so much as of 17.11. Thanks again, On Thu, Nov 30, 2017 at 4:04 PM Ole Holm Nielsen wrote: > On 11/30/2017 01:40 PM, Alan Orth wrote: > > I just built SLURM 17.11.0 on a CentOS 7 mac

[slurm-users] Strange problem with Slurm 17.11.0: "batch job complete failure"

2017-11-30 Thread Andy Riebs
We've just installed 17.11.0 on our 100+ node x86_64 cluster running CentOS 7.4 this afternoon, and periodically see a single node (perhaps the first node in an allocation?) get drained with the message "batch job complete failure". On one node in question, slurmd.log reports pam_unix(slur

Re: [slurm-users] Missing systemd unit files in SLURM 17.11.0 RPMs

2017-11-30 Thread Ole Holm Nielsen
On 11/30/2017 01:40 PM, Alan Orth wrote: I just built SLURM 17.11.0 on a CentOS 7 machine and was surprised to see that several systemd unit files were missing from the RPMs. For some reason the slurmdbd.service file is present though: $ rpmbuild -ta slurm-17.11.0.tar.bz2 $ rpm -qlp slurm-17.1

[slurm-users] Missing systemd unit files in SLURM 17.11.0 RPMs

2017-11-30 Thread Alan Orth
Hello, I just built SLURM 17.11.0 on a CentOS 7 machine and was surprised to see that several systemd unit files were missing from the RPMs. For some reason the slurmdbd.service file is present though: $ rpmbuild -ta slurm-17.11.0.tar.bz2 $ rpm -qlp slurm-17.11.0-1.el7.centos.x86_64.rpm | egrep "

Re: [slurm-users] Query about Compute + GPUs

2017-11-30 Thread Chris Samuel
On Thursday, 30 November 2017 9:58:08 PM AEDT Markus Köberl wrote: > I also saw the wrong Sockets, CPU and Threads. I did not recognize the wrong > values for RAM. Therefore I did define Sockets, CoresPerSocket, > ThreadsPerCore and RealMemory. You can run "slurmd -C" to get it to tell you the co

Re: [slurm-users] jobs stuck in ReqNodeNotAvail,

2017-11-30 Thread Chris Samuel
On Thursday, 30 November 2017 9:40:53 PM AEDT Christian Anthon wrote: > The queue has a ton of of single-core jobs and somebody submits a high > priority multi-core job, will the mulit-core job not run before all > single-core jobs are done or will slurm free up a node? I can see you are weightin

[slurm-users] Actual job running time

2017-11-30 Thread Tomislav Subic
Hi everyone, is there any way to see the actual job running time with sacct, and not the time resources were allocated to that job? I see there are many fields like Elapsed (which to my understanding and testing is the total time resources have been allocated to the job), but I can't figure ou

Re: [slurm-users] Query about Compute + GPUs

2017-11-30 Thread Markus Köberl
On Tuesday, 21 November 2017 16:38:48 CET Ing. Gonzalo E. Arroyo wrote: > I have a problem detecting RAM and Arch (maybe some more), check this... > > NodeName=fisesta-21-3 Arch=x86_64 CoresPerSocket=1 >CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.01 >AvailableFeatures=rack-21,2CPUs >ActiveF

Re: [slurm-users] jobs stuck in ReqNodeNotAvail,

2017-11-30 Thread Christian Anthon
Okay, how is slurm handling the following situation: The queue has a ton of of single-core jobs and somebody submits a high priority multi-core job, will the mulit-core job not run before all single-core jobs are done or will slurm free up a node? Cheers, Christian. On 30-11-2017 07:57, Ch

Re: [slurm-users] slurm conf with single machine with multi cores.

2017-11-30 Thread david vilanova
Sorry for the delay, was trying to fix it but still not working. The node is always down. The master machine is also the compute machine. It's a single server that i use for that. 1 node and 12 cpus. In the log below i see this line [2017-11-30T09:24:41.764] agent/is_node_resp: node:linuxcluster