Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-08 Thread Ole Holm Nielsen
Hi Edward, The squeue command tells you about job status. You can get extra information using format options (see the squeue man-page). I like to set this environment variable for squeue: export SQUEUE_FORMAT="%.18i %.9P %.6q %.8j %.8u %.8a %.10T %.9Q %.10M %.10V %.9l %.6D %.6C %m %R" Wh

Re: [slurm-users] Hints, Cheatsheets, etc

2019-07-08 Thread Ole Holm Nielsen
Hi Edward, Besides my Slurm Wiki page https://wiki.fysik.dtu.dk/niflheim/SLURM, I have written a number of tools which we use for monitoring our cluster, see https://github.com/OleHolmNielsen/Slurm_tools. I recommend in particular these tools: * pestat Prints a Slurm cluster nodes status wi

[slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-08 Thread Edward Ned Harvey (slurm)
I have a cluster, where I submit a bunch (600) jobs, but the cluster only runs about 20 at a time. By using pestat, I can see there are a bunch of systems with plenty of available cpu and memory. Hostname Partition Node Num_CPU CPUload Memsize Freemem Sta

[slurm-users] scavenger partition/qos

2019-07-08 Thread Hanu Pathuri
Hello, I am trying to setup my SLURM cluster. One of thing I want to achieve was to schedule jobs which will be run on when there are no high priority tasks. My understanding is that this can be achieved by either configuring a partition with pre-empt mode 'Suspend/Reque' with priority for this

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

2019-07-08 Thread Robert Kudyba
Thanks Brian indeed we did have it set in bytes. I set it to the MB value. Hoping this takes care of the situation. > On Jul 8, 2019, at 4:02 PM, Brian Andrus wrote: > > Your problem here is that the configuration for the nodes in question have an > incorrect amount of memory set for them. Loo

Re: [slurm-users] Substituions for "see META file" in slurm.spec file of 15.08.11-1 release

2019-07-08 Thread Pariksheet Nanda
Hi Samuel, On Mon, Jul 8, 2019 at 8:19 PM Fulcomer, Samuel wrote: > > The underlying issue is database schema compatibility/regression. Each upgrade is only intended to provided capability to successfully upgrade the schema from two versions back. --snip-- > ...and you should follow the upgrade i

Re: [slurm-users] Hints, Cheatsheets, etc

2019-07-08 Thread mercan
Hi; There is a official page which gives a lot of link to third party solutions you can use: https://slurm.schedmd.com/download.html According to me, the best slurm page for system administration is: https://wiki.fysik.dtu.dk/niflheim/SLURM At this page, You can find a lot of links and inf

Re: [slurm-users] Substituions for "see META file" in slurm.spec file of 15.08.11-1 release

2019-07-08 Thread Fulcomer, Samuel
Hi Pariksheet, Note that an "upgrade", in the sense that retained information is converted to new formats, is only relevant for the slurmctld/slurmdbd (and backup) node. If you're planning downtime in which you quiesce job execution (i.e., schedule a maintenance reservation), and have image conf

Re: [slurm-users] Substituions for "see META file" in slurm.spec file of 15.08.11-1 release

2019-07-08 Thread Pariksheet Nanda
Hi Brian, On Mon, Jul 8, 2019 at 8:09 PM Brian Andrus wrote: > > Yours are probably simple enough: > > Name: slurm > Version: 15.08.11 > Release 1 > > which becomes slurm-15.08.11-1 > You may see some issues with License and/or changelog as the format of SPEC files changed a little awhile back, s

Re: [slurm-users] Substituions for "see META file" in slurm.spec file of 15.08.11-1 release

2019-07-08 Thread Brian Andrus
Yours are probably simple enough: Name: slurm Version: 15.08.11 Release 1 which becomes slurm-15.08.11-1 You may see some issues with License and/or changelog as the format of SPEC files changed a little awhile back, so the latest rpmbuild may not like things. However, I highly suggest you u

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

2019-07-08 Thread Brian Andrus
Your problem here is that the configuration for the nodes in question have an incorrect amount of memory set for them. Looks like you have it set in bytes instead of megabytes In your slurm.conf you should look at the RealMemory setting: *RealMemory* Size of real memory on the node in megab

[slurm-users] Substituions for "see META file" in slurm.spec file of 15.08.11-1 release

2019-07-08 Thread Pariksheet Nanda
Hi SLURM devs, TL;DR: What magic incantations are needed to preprocess the slurm.spec file in SLURM 15? Our cluster is currently running SLURM version 15.08.11. We are planning some downtime to upgrade to 17 and then to 19, and in preparation for the upgrade I'm simulating the upgrade steps in l

[slurm-users] Hints, Cheatsheets, etc

2019-07-08 Thread Edward Ned Harvey (slurm)
I am an experienced sysadmin, new to being a slurm admin, and I'm encountering some difficulty: If you have a simple question such as "how many cpu's are currently being used in the foobar partition," or "give me an overview of the waiting jobs and what are the reasons they're waiting" I don't

[slurm-users] sbatch tasks stuck in queue when a job is hung

2019-07-08 Thread Robert Kudyba
I’m new to Slurm and we have a 3 node + head node cluster running Centos 7 and Bright Cluster 8.1. Their support sent me here as they say Slurm is configured optimally to allow multiple tasks to run. However at times a job will hold up new jobs. Are there any other logs I can look at and/or sett

Re: [slurm-users] Problem with sbatch

2019-07-08 Thread Michael Gutteridge
Hi I can't find the reference here, but if I recall correctly the preferred user for slurmd is actually root. It is the default. > I assume this can be fixed by modifying the configuration so "SlurmdUser=root", but does this imply that anything run with `srun` will be actually executed by root?

Re: [slurm-users] Problem with sbatch

2019-07-08 Thread Goetz, Patrick G
Sudo is more flexible than than; for example you can just give the slurmduser sudo access to the chown command and nothing else. On 7/8/19 11:37 AM, Daniel Torregrosa wrote: > You are right. The critical part I was missing is that chown does not > work without sudo. > > I assume this can be fix

Re: [slurm-users] Problem with sbatch

2019-07-08 Thread Daniel Torregrosa
You are right. The critical part I was missing is that chown does not work without sudo. I assume this can be fixed by modifying the configuration so "SlurmdUser=root", but does this imply that anything run with `srun` will be actually executed by root? This seems dangerous. Thanks a lot. On Mon

[slurm-users] Problem with sbatch

2019-07-08 Thread Daniel Torregrosa
Hi all, I am currently testing slurm (slurm-wlm 17.11.2 from a newly installed and updated Ubuntu server LTS). I managed to make it work on a very simple 1 master node and 2 compute nodes configuration. All three nodes have the same users (namely root, slurm and test), with slurm running both slur