There seems to be an issue with the TmpDisk value reporting in Slurm 24.05.3.
While the correct value is displayed using the scontrol show nodes command,
sinfo appears to report an incorrect value under certain conditions.
For example, the TmpDisk parameter for my compute nodes is configured as
Hello,
I observe strange behavior of advanced reservations having OVERLAP in their
flag's list.
If I create two advanced reservations on different set of nodes and a
particular username
is configured to only have an access to one with the flag OVERLAP, then the
username can
also run jobs on n
Hello,
I observe strange behavior of advanced reservations having OVERLAP in their
flag's list.
If I create two advanced reservations on different set of nodes and a
particular username
is configured to only have an access to one with the flag OVERLAP, then the
username can
also run jobs on n
Hello Brian,
thanks a lot for the info.
>
> You may be able to use the alternate approach that I was able to do as well.
>
I would be insterested in any alternatives. Could you point me to some doc?
Best wishes
Gizo
> Brian Andrus
>
>
> On 2/28/2023 7:44 AM, Gizo Na
Hello,
it seems that if a slurm power saving is enabled then the parameter
"Weight" seem to be ignored for nodes that are in a power down state.
Is there any way to make the option working for a cluster running slurm
in a powe saveing mode?.
I am aware of the note to the weight option in the
enburger Platz 1
> 13353 Berlin
>
> magnus.hagd...@charite.de
> https://www.charite.de
> HPC Helpdesk: sc-hpc-helpd...@charite.de
--
___
Dr. Gizo Nanava
Group Leader, Scientific Computing
Leibniz Universität IT Services
Leibniz Universität Hannover
Schlosswender Str. 5
D-30159 Hannover
Tel +49 511 762 7919085
http://www.luis.uni-hannover.de
, presumable
caused by some race conditions -
in very rare cases, salloc works without this issue.
I see that doc on the Slurm power saving mentions about salloc, but not for the
case of interactive use of it.
Thank you & best regards
Gizo
> On 27/10/22 4:18 am, Gizo Nanava wrote:
>
>
Hello,
we run into another issue when using salloc interactively on a cluster where
Slurm
power saving is enabled. The problem seems to be caused by the job_container
plugin
and occurs when the job starts on a node which boots from a power down state.
If I resubmit a job immediately after the
Please ignore the question - the option SchedulerParameters=salloc_wait_nodes
solves the issue.
kind regards
Gizo
> Hello,
>
> it seems that in a cluster configured for power saving, salloc does not wait
> until the nodes
> assigned to the job recover from the power down state and go back
Hello,
it seems that in a cluster configured for power saving, salloc does not wait
until the nodes
assigned to the job recover from the power down state and go back to normal
operation
Although the job is in the state CONFIGURING and the node are still in
IDLE+NOT_RESPONDING+POWERING_UP,
th
lurmctl is identified through dns srv record. If I run sinfo without prior
execution of salloc, then it works.
Do cluster login nodes still require slurm.conf file?
Thank you.
Best regards
Gizo
--
___
Dr. Gizo Nanava
Leibniz Universitaet IT Services
Leibniz Universitaet Ha
Hi,
it seems cons_res and cons_tres allocate cpus across nodes differently. The
doc here https://slurm.schedmd.com/cpu_management.html#Overview
"When using SelectType=select/cons_res, the default allocation method across
nodes is block allocation (allocate all available CPUs in a node before u
12 matches
Mail list logo