[slurm-users] How to get an estimate of job completion for planned maintenance?

2021-11-05 Thread Ahmad Khalifa
If I plan maintenance on a certain day, how long before that day should I set the queue to drain mode?! Is there a way to estimate the completion date / time of current running jobs?! Regards.

Re: [slurm-users] Possible to get cluster utilization by partition?

2021-11-05 Thread Chin,David
Hi, Ole: Thanks for the link to your tools. I'll take a closer look at them. I wanted to make sure it wasn't something Slurm already had before I started writing my own. Perhaps someone out there can make that feature request and sponsor its development at SchedMD. I feel like it would be a gene

Re: [slurm-users] Wrong hwloc detected?

2021-11-05 Thread Diego Zuccato
They aren't using modules so it must be something system-wide :( But not all jobs are impacted. And it seems it's a bit random (doesn't happen always). I'm out of ideas, currently :( Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto: On 11/5/21 12:47, Diego Zuccato wrote: Some users are report

Re: [slurm-users] Parallel sbatch

2021-11-05 Thread Loris Bennett
Hi Marcus, I would advise against putting a loop in your job script. The job array mechanism is designed exactly for this purpose: https://slurm.schedmd.com/job_array.html If you have very small jobs, it is usually better to run them separately so that the individual jobs can fill gaps in the

Re: [slurm-users] Parallel sbatch

2021-11-05 Thread Richard Lefebvre
I would suggest using Gnu Parallel (https://www.gnu.org/software/parallel/). Also, if you run that many "srun" in a row, on a very large cluster where the slurmctl is very solicited some of the srun might time out and not run. Richard Le ven. 5 nov. 2021 à 05:45, Marcus Pedersén a écrit : > Hi

Re: [slurm-users] Wrong hwloc detected?

2021-11-05 Thread Ole Holm Nielsen
On 11/5/21 12:47, Diego Zuccato wrote: Some users are reporting this error: slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5 slurmstepd-str957-mtx-01: error: task[0] unable to set taskset

[slurm-users] Wrong hwloc detected?

2021-11-05 Thread Diego Zuccato
Hello all. Some users are reporting this error: slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5 slurmstepd-str957-mtx-01: error: task[0] unable to set taskset '0x0' I checked on that nod

Re: [slurm-users] Parallel sbatch

2021-11-05 Thread Sean McGrath
Hi Marcus, Is something like staskfarm, https://github.com/paddydoyle/staskfarm, https://www.tchpc.tcd.ie/node/1127 any use for your needs? Sorry if not. Regards Sean On Fri, Nov 05, 2021 at 10:42:32AM +0100, Marcus Peders?n wrote: > Hi all, > I have setup a basic slurm system and been testin

[slurm-users] Parallel sbatch

2021-11-05 Thread Marcus Pedersén
Hi all, I have setup a basic slurm system and been testing out a nuber of things. The latest thing I started to test is the parallel parts. What I have is about 70 independent scripts that would be ideal to run in parallel. For testing purposes I have created 20 dummy scripts that print script name

Re: [slurm-users] Possible to get cluster utilization by partition?

2021-11-05 Thread Ole Holm Nielsen
Hi Dave, On 11/4/21 21:47, Chin,David wrote: I am running Slurm 20.02.7. I would like to generate cluster utilization report based on the billing TRES, but separated by partition. I can get full cluster utilization using:     sreport cluster utilization -T billing start=2021-01-01 end=2021-06