Re: [slurm-users] squeue: compact pending job-array in one partition, but not in other

2021-02-23 Thread Loris Bennett
Hi Jeffrey, Yes, those jobs (and elements of other array which were also listed one per line) had indeed been preempted and requeued. So is this behaviour intended/documented or is it a bug? Cheers, Loris Jeffrey T Frey writes: > Did those four jobs > > >6577272_21 scavenger PD

Re: [slurm-users] [External] exempting a node from Gres Autodetect

2021-02-23 Thread Prentice Bisbal
Correction/addendum: If the node you want to exclude has RPMS that were built without NVML autodetection, you probably want that gres.conf to look like this: NodeName=a1-10 Name=gpu File=/dev/nvidia0 I'm guessing if it was built without Autodetection, the AutoDetect=off option wouldn't be und

Re: [slurm-users] [External] exempting a node from Gres Autodetect

2021-02-23 Thread Prentice Bisbal
How many nodes are we talking about here? What if you gave each node it's own gres.conf file, where all of them said AutoDetect=nvml Except the one you want to exclude, which would have this in gres.conf : NodeName=a1-10 AutoDetect=off Name=gpu File=/dev/nvidia0 It seems to me like Autodetect

Re: [slurm-users] [External] Re: exempting a node from Gres Autodetect

2021-02-23 Thread Prentice Bisbal
I don't see how that bug is related. That bug is about requiring the libnvidia-ml.so library for an RPM that was built with NVML Autodetect enabled. His problem is the opposite - he's already using NVML autodetect, but wants to disable that feature on a single node, where it looks like that nod

Re: [slurm-users] squeue: compact pending job-array in one partition, but not in other

2021-02-23 Thread Jeffrey T Frey
Did those four jobs 6577272_21 scavenger PD 0:00 1 (Priority) 6577272_22 scavenger PD 0:00 1 (Priority) 6577272_23 scavenger PD 0:00 1 (Priority) 6577272_28 scavenger PD 0:00 1 (Priority) run before and get requeued? Seems

[slurm-users] squeue: compact pending job-array in one partition, but not in other

2021-02-23 Thread Loris Bennett
Hi, Does anyone have an idea why pending elements of an array job in one partition should be displayed compactly by 'squeue' but those of another in a different partition are displayed one element per line? Please see below (compact display in 'main', one element per line in 'scavenger'). This is

[slurm-users] modify all users QoS associations

2021-02-23 Thread Lu Weizheng
Hi, I create a new QoS rule and want everybody can use the QoS Is there a way to make all users use the new QoS. Not by add association one by one like: sacctmgr modify user crock set qos+=alligator

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Luke Sudbery
> -Original Message- > From: slurm-users On Behalf Of > ole.h.niel...@fysik.dtu.dk > Sent: 23 February 2021 15:04 > > Just a thought: Do you run a recent Slurm version? Which version of > MariaDB/MySQL do you run? > /Ole We're currently running Slurm 20.02.6-1 and MariaDB 10.3.28. But

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Luke Sudbery
Yes, we suspected something like that... we have already increased innodb_buffer_pool_size from 32G to 64G (and have new DB nodes on the way) but it didn't help. There aren't dedicated DB nodes though. We assumed it must be some tipping point thing, hence looking into purging. But like in the

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Ole Holm Nielsen
On 23-02-2021 15:19, mercan wrote: Hi; May be the database can not fit innodb buffer any more. If there are enough room to increase this value(innodb_buffer_pool_size) , to find reason, you can try the increase. The details of modifying the Innodb parameters are described in https://wiki.fys

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread mercan
Hi; May be the database can not fit innodb buffer any more. If there are enough room to increase this value(innodb_buffer_pool_size) , to find reason, you can try the increase. Ahmet M. 23.02.2021 17:03 tarihinde Luke Sudbery yazdı: That great, thanks. We were thinking about staging it lik

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Luke Sudbery
That great, thanks. We were thinking about staging it like that, and using days is simpler to trigger than waiting for the month. We will also need to increase innodb_lock_wait_timeout first so we don't hit the problems described in https://bugs.schedmd.com/show_bug.cgi?id=4295. Anyone know why

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Ole Holm Nielsen
On 2/23/21 1:25 PM, Luke Sudbery wrote: We have suddenly got bad performance from sreport, querying a 1 hour period (in the last 24 hours) for TopUsage went from taking under a minute to timing out after the 15 minutes max slurmdbd query time – although the SQL query on the DB server continued

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread Luke Sudbery
Command in question is: sreport --parsable2 user topusage topcount=3 start=10/15/19 end=10/16/19 Similar to this: https://bugs.schedmd.com/show_bug.cgi?id=2315 where the problem eventually just 'went away'. We also have >12000 associations and see a large number of them (>9000) listed in the S

[slurm-users] Slurmdbd purge settings

2021-02-23 Thread Luke Sudbery
We have suddenly got bad performance from sreport, querying a 1 hour period (in the last 24 hours) for TopUsage went from taking under a minute to timing out after the 15 minutes max slurmdbd query time - although the SQL query on the DB server continued long after that. So firstly we were wond