Hi;
Using the noki user, would you try to read
/var/run/slurm-llnl/slurmd.pid and /var/run/slurm-llnl/slurmctld.pid
files. Are there these files present, readable and writeable? May be
upper directories don't have the permission to read/execute.
Regards;
Ahmet M.
On 19.06.2019 07:26, Noki
Hi,
It just shows
"Node $NODE not found"
Whereas others all work as expected (ie, they are running)
Without knowing the internals of slurm it feels like nodes that are turned
off+cloud state don't exist in the system until they are on?
Any other ideas?
Thanks
Nathan
On Wed., 19 Jun. 2019, 4:
On Tuesday, 18 June 2019 9:36:56 PM PDT nathan norton wrote:
> Just tried running that command, but it only shows nodes that are up and
> running, doesn’t tell me about any nodes that are down and turned off, as
> an example please see below. There is a job running that should be using
> the 100 n
Hi,
Just tried running that command, but it only shows nodes that are up and
running, doesn’t tell me about any nodes that are down and turned off, as
an example please see below. There is a job running that should be using
the 100 nodes but only 52 are allocated (plus 2 down* (that I know about
Hi;
Sorry, as you can see, I did a mistake again. I wrote two different
directories:
"The owner of the /var/run/slurm-llnl directory and the
slurmctld.pid and slurmd.pid files should be "noki" user.
chown -R noki:root /var/spool/slurm-llnl"
You should run:
chown -R noki:root /var/run/slurm
Hi, slurm-users and mercan.
I tried what you said.
noki@noki-System-Product-Name:~$ sudo chown -R noki:root
/var/spool/slurm-llnl/noki@noki-System-Product-Name:/var/spool/slurm-llnl$
ls -l
total 92
-rw--- 1 noki root 198 Jun 19 11:36 assoc_mgr_state
-rw--- 1 noki root 198 Jun 18 20:31 ass
Greetings --
We're running Slurm 17.02.2.
- We have implemented OnDemand in our cluster, including the Jupyter app
across all the compute nodes. The Interactive Desktop application, however,
is installed on a small set of compute nodes during an extended validation
period. Installatio
Hi;
I did not notice
SlurmUser=noki
line. The owner of the /var/run/slurm-llnl directory and the
slurmctld.pid and slurmd.pid files should be "noki" user.
chown -R noki:root /var/spool/slurm-llnl
Regards;
Ahmet M.
On 18.06.2019 15:15, mercan wrote:
Hi;
The owner of the /var/run/slurm-l
Hi;
The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and
slurmd.pid files should be "slurm" user. Your files owner are root and
noki.
chown -R slurm:slurm /var/spool/slurm-llnl
Regards;
Ahmet M.
On 18.06.2019 15:03, Noki Lee wrote:
Though SLURM works fine for job su
Though SLURM works fine for job submitting, running, and queueing, I got a
minor error below.
sudo systemctl status slurmd
Jun 12 10:20:40 noki-System-Product-Name systemd[1]: slurmd.service: Can't
open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after start: No such
file or directory
sudo sy
Hi Nathan,
The command I use to get the reason for failed nodes is ... 'sinfo -Ral'. If
you need to extend the width of the output then ... 'sinfo -Ral -O
reason:35,user,timestamp,statelong,nodelist'.
Using the timestamp of the failure look in the slurmd or slurmctld logs.
---
Sam Gallop
Hi all,
I am using slurm with a cloud provider it is all working a treat.
lets say i have 100 nodes all working fine and able to be scheduled,
everything works fine.
$ srun -N100 hostname
works fine.
For some unknown reason after machines shut down for example over the
weekend if no jobs g
12 matches
Mail list logo