Also why aren't you using the Slurm commands to run things?
Which command?
srun or sbatch
>Also why aren't you using the Slurm commands to run things?
Which command?
Regards,
Mahmood
On Monday, 29 April 2019 5:18:56 AM PDT Mahmood Naderan wrote:
> [mahmood@rocks7 ~]$ rocks run host compute-0-1 "file
> /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2"
Given that file says it's a shell script, try and run it with this to see what
doesn't work:
rocks run host compute
I see two separate, unrelated problems here:
Problem 1:
Warning: untrusted X11 forwarding setup failed: xauth key data not
generated
What have you done to investigate this xauth problem further?
I know there have been discussions about this problem in the past on
this mailing list. Did you
[mahmood@rocks7 ~]$ rocks run host compute-0-1 "file
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2"
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: POSIX shell
script, ASCII text executable
[mahmoo
On 27/4/19 2:20 am, Mahmood Naderan wrote:
./workbench.sh: line 4:
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: No such file
or directory
That doesn't look like it's related to Slurm to me, if the file itself
exists then my suspicion is that it's a script and the interpreter i
>More constructively - maybe the list can help you get the X11
applications to run using Slurm.
>Could you give some details please?
For example, I an not run this GUI program with salloc
[mahmood@rocks7 ~]$ cat workbench.sh
#!/bin/bash
unset SLURM_GTIDS
/state/partition1/ans190/v190/Framewor
I would suggest that if those applications really are not possible with
Slurm - then reserve a set of nodes for interactive use and disable the
Slurm daemon on them.
Direct users to those nodes.
More constructively - maybe the list can help you get the X11 applications
to run using Slurm.
Could yo
Thanks for the info.
Thing is that I don't want to totally set the node as unhealthy. Assume the
following scenarios:
compute-0-0 running slurm jobs and system load is 15 (32 cores)
compute-0-1 running non-slurm jobs and system load is 25 (32 cores)
Then a new slurm job should be dispatched to com
On 4/23/19 2:47 AM, Mahmood Naderan wrote:
Hi,
How can I change the job distribution policy? Since some nodes are
running non-slurm jobs, it seems that the dispatcher isn't aware of
system load. Therefore, it assumes that the node is free.
I want to change the policy based on the system load
Hi Mahmood,
Try the LBNL Node Health Check tool. Nodes which are determined to be
"unhealthy" can be marked as down or offline so as to prevent jobs from being
scheduled or run on them.
https://github.com/mej/nhc/blob/master/README.md#lbnl-node-health-check-nhc
Regards,
Richard
@cnscfr
--
Sent
Hi,
How can I change the job distribution policy? Since some nodes are running
non-slurm jobs, it seems that the dispatcher isn't aware of system load.
Therefore, it assumes that the node is free.
I want to change the policy based on the system load.
Regards,
Mahmood
12 matches
Mail list logo