Even after rebooting, sometimes nodes are stuck because of "completing
jobs".
What helps then is to set the node down and resume it afterwards:
scontrol update nodename= state=drain reason=stuck; scontrol
update nodename= state=resume
Best
Marcus
Am 20.09.2023 um 09:11 schrieb Ole Holm Nie
Hi Frank,
use Features on the nodes, every cpu node gets e.g. "cpu", every gpu
node e.g. "gpu".
If a job asks for no gpus, set an additional constraint "cpu" for the job.
Best
Marcus
Am 29.03.2023 um 01:24 schrieb Frank Pari:
Well, I wanted to avoid using lua. But, it looks like that's goi
it host is configured to be
lower than it is on the worker nodes; Slurm gets this error and shows
it to you as you were using the --propagate option?
Regards,
Hermann
On 3/23/23 08:00, Wagner, Marcus wrote:
Hi Folks,
has anyone ever stumbled upon such an error:
slurmstepd: error: Can'
Hi Folks,
has anyone ever stumbled upon such an error:
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767202 from submit
host: Invalid argument
Anyone knows, where that comes from?
Any hints are welcome.
Best
Marcus
smime.p7s
Description: S/MIME Cryptographic Signature
1 n24/
/
/
Thank you,
Nícolas
*De:* slurm-users em nome de Wagner, Marcus
*Enviadas:* Terça-feira, 14 de Março de 2023 07:25
*Para:* slurm-users@lists.schedmd.com
*Assunto:* Re: [slurm-users] Partition Hold/Release
Hi
Hi Nicolas,
you could use the prioritytier for partitions:
PriorityTier
Jobs submitted to a partition with a higher PriorityTier
value will be evaluated by the scheduler before pending jobs in a
partition with a lower PriorityTier value. They will
also b
Hi Amir,
as far as I can tell, there is no way to create resources dynamically.
In general, a gres is a generic resource, e.g. tmp-Space, or whatever
could be scheduled and must be restricted by a job.
A feature is more like a binary switch. You could for example set a
feature "amd" on all a
Hmm…
That one is strange.
Can you try just hwloc-ls?
I wonder, how slurmd would get that information, if it is not hwloc-based
Best
Marcus
Von unterwegs gesendet.
> Am 15.12.2022 um 16:00 schrieb Paul Raines :
>
>
> Nice find!
>
> Unfortunately this does not work on the original box this