Re: [slurm-users] help with canceling or deleteing a job

2023-09-20 Thread Wagner, Marcus
Even after rebooting, sometimes nodes are stuck because of "completing jobs". What helps then is to set the node down and resume it afterwards: scontrol update nodename= state=drain reason=stuck; scontrol update nodename= state=resume Best Marcus Am 20.09.2023 um 09:11 schrieb Ole Holm Nie

Re: [slurm-users] Keep CPU Jobs Off GPU Nodes

2023-03-29 Thread Wagner, Marcus
Hi Frank, use Features on the nodes, every cpu node gets e.g. "cpu", every gpu node e.g. "gpu". If a job asks for no gpus, set an additional constraint "cpu" for the job. Best Marcus Am 29.03.2023 um 01:24 schrieb Frank Pari: Well, I wanted to avoid using lua.  But, it looks like that's goi

Re: [slurm-users] RLIMIT_NPROCS

2023-03-23 Thread Wagner, Marcus
it host is configured to be lower than it is on the worker nodes; Slurm gets this error and shows it to you as you were using the --propagate option? Regards, Hermann On 3/23/23 08:00, Wagner, Marcus wrote: Hi Folks, has anyone ever stumbled upon such an error: slurmstepd: error: Can'

[slurm-users] RLIMIT_NPROCS

2023-03-23 Thread Wagner, Marcus
Hi Folks, has anyone ever stumbled upon such an error: slurmstepd: error: Can't propagate RLIMIT_NPROC of 767202 from submit host: Invalid argument Anyone knows, where that comes from? Any hints are welcome. Best Marcus smime.p7s Description: S/MIME Cryptographic Signature

Re: [slurm-users] Partition Hold/Release

2023-03-15 Thread Wagner, Marcus
     1 n24/ / / Thank you, Nícolas *De:* slurm-users em nome de Wagner, Marcus *Enviadas:* Terça-feira, 14 de Março de 2023 07:25 *Para:* slurm-users@lists.schedmd.com *Assunto:* Re: [slurm-users] Partition Hold/Release Hi

Re: [slurm-users] Partition Hold/Release

2023-03-14 Thread Wagner, Marcus
Hi Nicolas, you could use the prioritytier for partitions:    PriorityTier   Jobs submitted to a partition with a higher PriorityTier value will be evaluated by the scheduler before pending jobs in a partition with a lower PriorityTier value. They  will   also  b

Re: [slurm-users] Does Slurm have any equivalent to LSF elim for generating dynamic node resources

2023-03-03 Thread Wagner, Marcus
Hi Amir, as far as I can tell, there is no way to create resources dynamically. In general, a gres is a generic resource, e.g. tmp-Space, or whatever could be scheduled and must be restricted by a job. A feature is more like a binary switch. You could for example set a feature "amd" on all a

Re: [slurm-users] CPUSpecList confusion

2022-12-15 Thread Wagner, Marcus
Hmm… That one is strange. Can you try just hwloc-ls? I wonder, how slurmd would get that information, if it is not hwloc-based Best Marcus Von unterwegs gesendet. > Am 15.12.2022 um 16:00 schrieb Paul Raines : > >  > Nice find! > > Unfortunately this does not work on the original box this