Hi,
I'm wondering if there is any built-in option to autoset a job TimeLimit
to fit within a defined reservation.
For now, it seems to me that the timelimit must be explicitely provided,
in a agreement with the deadline of the reservation, by a user when
invoking the srun or sbatch command while
t; # echo "$(((NEXTRES - NOW) / 3600)) hours left until reservation begins"
> 178 hours left until reservation begins
>
> Cheers,
> Florian
>
>
>
> *From:* slurm-users on behalf
> of Jeremy Fix
> *Sent:* Monday, 29 March 2021 10:4
Hello !
I'm facing a weird issue. With one user, call it gpupro_user , if I log
with ssh on a compute node, I can run a vncserver (see command [1]
below) succesfully (in my case, a tigervnc server). However, if I
allocate the exact same node through a srun (see command [2] below),
running vnc ser
Actually, I solved the issue by observing that the user had created a
file "~/.vnc/xstartup*.sh*" while it should have been "~/.vnc/xstartup"
at least, simply removing the extension and vncserver starts
successfully, even in a srun !
Best;
Jeremy.
On 15/05/2021 14:
Hi,
I'm unsuccessful in running an X11 application with a remote
SlurmctldHost. Let us call myfrontalnode the node from which the user is
running the slurm commands that is different from the host SlurmctldHost.
What fails is the following :
ssh -X myfrontalnode
srun --x11 xclock
which
_xauthority”
> 5. Update your slurm cluster and restart.
>
>
>
> Steps 3&4 seemed to be the key ones I originally missed – especially 4
> (https://slurm.schedmd.com/slurm.conf.html#OPT_X11Parameters
> <https://slurm.schedmd.com/slurm.conf.html#OPT_X11Parameters>)
>
Hello everyone,
we are facing a weird issue. On a regular basis, some compute nodes go
from *idle* -> *idle** -> *down* and loop back to idle on its own; The
slurm manages several nodes and this state cycle appears only for some
pools of nodes.
We get a trace on the compute node as :
[2022
help,
Jeremy.
That looks like a DNS issue.
Verify all your nodes are able to resolve the names of each other.
Check /etc/resolv.conf, /etc/hosts and /etc/slurm/slurm.conf on the
nodes (including head/login nodes) to ensure they all match.
Brian Andrus
On 2/1/2022 1:37 AM, Jeremy Fix wrote
Hi,
A follow-up. I though some of nodes were ok but that's not the case;
This morning, another pool of consecutive (why consecutive by the way?
they are always consecutively failing) compute nodes are idle* . And now
of the nodes which were drained came back to life in idle and now again
swit
emy,
What is the value of TreeWidth in your slurm.conf? If there is no
entry then I recommend setting it to a value a bit larger than the
number of nodes you have in your cluster and then restarting slurmctld.
Best,
Steve
On Wed, Feb 2, 2022 at 12:59 AM Jeremy Fix
wrote:
Hi,
A fo
10 matches
Mail list logo