Re: [slurm-users] Allow SFTP on a specific compute node

2022-07-11 Thread Jake Jellinek
I cannot think of any way to do this within the Slurm configuration I would solve this by having a wrapper run at boot time which started a new sshd process on a different port which you secured (ie only that user could connect) and then start this as part of your boot time scripts If your scrip

Re: [slurm-users] Allow SFTP on a specific compute node

2022-07-11 Thread Ole Holm Nielsen
On 7/12/22 06:51, Ratnasamy, Fritz wrote:  Currently, our cluster does not allow ssh to compute nodes for users unless they have a running job on that compute node. I believe a system admin has set up a PAM module that does the block. Whn trying ssh, this is the message returned: Access denied

[slurm-users] Allow SFTP on a specific compute node

2022-07-11 Thread Ratnasamy, Fritz
Hello, Currently, our cluster does not allow ssh to compute nodes for users unless they have a running job on that compute node. I believe a system admin has set up a PAM module that does the block. Whn trying ssh, this is the message returned: Access denied by pam_slurm_adopt: you have no active

[slurm-users] Is there a way create reservations w/o being Operator or Admin?

2022-07-11 Thread David Henkemeyer
I would like to remove the restriction that users must be at least operator level to do "scontrol create reservation". So, either I could: - Change the default AdminLevel to operator. Is that possible? - Remove the restriction that a user has to be operator to create a reservation. Is

Re: [slurm-users] Frontend node mode issues identified in v22.05.2

2022-07-11 Thread Jordi Blasco
Thank Ole, I checked the /etc/nsswitch.conf and I have even setup a dnsmasq service, just in case. [root@slurm-simulator /]# cat /etc/nsswitch.conf | grep hosts # Valid databases are: aliases, ethers, group, gshadow, hosts, hosts: files dns myhostname [root@slurm-simulator /]# ping slurm-si

Re: [slurm-users] Frontend node mode issues identified in v22.05.2

2022-07-11 Thread Ole Holm Nielsen
On 7/11/22 12:54, Jordi Blasco wrote: I use the front-end node mode to emulate a real cluster in order to validate the Slurm configuration in a Docker container and develop custom plugins. With versions 21.08.8-2 and 22.05.2, slurmd is complain

[slurm-users] Frontend node mode issues identified in v22.05.2

2022-07-11 Thread Jordi Blasco
Hi, I use the front-end node mode to emulate a real cluster in order to validate the Slurm configuration in a Docker container and develop custom plugins. With versions 21.08.8-2 and 22.05.2, slurmd is complaining about not being able to find the f

Re: [slurm-users] 答复: how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread Kamil Wilczek
I think that the previous answer from ole.h.niel...@fysik.dtu.dk might be helpful in that case. But whether this is in the same building or in some more distatnt location the timeout shouldn't exceed a second or two. I do not understand why the timeouts are set so high by default -- workloads mess

[slurm-users] 答复: how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread taleintervenor
Hello, Kamil Wilczek: Well I agree that the non-responding case may caused by network unstable, since our slurm cluster has 2 part nodes geographical distant distributed with only ethernet link them. Those reported nodes are all in one building while the slurmctld node in another building. But

Re: [slurm-users] how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread Ole Holm Nielsen
On 7/11/22 09:32, taleinterve...@sjtu.edu.cn wrote: Recently we found some strange log in slurmctld.log about node not responding, such as: [2022-07-09T03:23:10.692] error: Nodes node[128-168,170-178] not responding [2022-07-09T03:23:58.098] Node node171 now responding [2022-07-09T03:23:58.09

Re: [slurm-users] how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread Kamil Wilczek
Hello, I know that this is not quite the answer, but you could additionally (and maybe you already did this :)) check if this is not a network problem: * Are the nodes available outside of Slurm during that time? SSH, ping? * If you have a monitoring system (Prometheus, Icinga, etc.), are ther

[slurm-users] how do slurmctld determine whether a compute node is not responding?

2022-07-11 Thread taleintervenor
Hi, all: Recently we found some strange log in slurmctld.log about node not responding, such as: [2022-07-09T03:23:10.692] error: Nodes node[128-168,170-178] not responding [2022-07-09T03:23:58.098] Node node171 now responding [2022-07-09T03:23:58.099] Node node165 now responding [2022-07-0

Re: [slurm-users] limit the queued jobs

2022-07-11 Thread Ole Holm Nielsen
On 7/11/22 07:55, Purvesh Parmar wrote: Hi, Thank you for the response. However, MaxJobsAccrue only limits to assigning priorities to jobs i.e. Maximum number of pending jobs able to accrue age priority at any given time. But it does not stop/ allow  user from submitting further jobs. I s