[slurm-users] Re: Lots of RPC calls and REQUEST_GETPW calls

2025-05-07 Thread Patryk Bełzak via slurm-users
nabled by default in SLURM. > > Regards > Patryk. > > On 25/05/07 10:13AM, Ole Holm Nielsen via slurm-users wrote: > > On 5/7/25 09:57, Patryk Bełzak via slurm-users wrote: > > > Hi, > > > why you think it's an authentication requests? As far as I unders

[slurm-users] Re: Lots of RPC calls and REQUEST_GETPW calls

2025-05-07 Thread Patryk Bełzak via slurm-users
, Ole Holm Nielsen via slurm-users wrote: > On 5/7/25 09:57, Patryk Bełzak via slurm-users wrote: > > Hi, > > why you think it's an authentication requests? As far as I understand > > multiple UIDs are asking for job and partition info. It's unlikely that all > >

[slurm-users] Re: Lots of RPC calls and REQUEST_GETPW calls

2025-05-07 Thread Patryk Bełzak via slurm-users
Hi, why you think it's an authentication requests? As far as I understand multiple UIDs are asking for job and partition info. It's unlikely that all of them perform that kind of requests the same way and in the same time, so I think you should look for some external program that may do that - i

[slurm-users] Re: MinTRES in QoS and power saving

2025-02-26 Thread Patryk Bełzak via slurm-users
Hi,  there was this issue raised some time ago: https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg10799.html We're experiencing exactly the same issue now with GPU nodes in power saving, some (but not all) jobs doesn't start because of that, and it's annoying users - badly. Anyone

[slurm-users] Re: Print Slurm Stats on Login

2024-08-21 Thread Patryk Bełzak via slurm-users
Hi, what Ole wrote is exactly what crossed my mind. I had an episode with stats at login too, I put reportseff to motd script and it was a bad idea. It turned out that if for any reason slurm controler took longer time to respond, it delayed user login which annoyed them more than they apprecia

[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-24 Thread Patryk Bełzak via slurm-users
termac.org<mailto:jason.el...@petermac.org> > 305 Grattan Street > Melbourne, Victoria > 3000 Australia > > www.petermac.org<http://www.petermac.org> > > [/var/folders/5b/sblmh0652x10d01v52f6htzrng5ffk/T/com.microsoft.Outlook/WebArchiveCopyPasteTempFiles/cidec351626

[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-22 Thread Patryk Bełzak via slurm-users
Hi, we've been facing the same issue for some time. At the beginning the missing socket error happened every 20 minutes, later once per hour, now it happens few times a day. The only downside of this was that controller was unresponsive for that couple of seconds - up to 60, if I remember well.

[slurm-users] Re: Problems with gres.conf

2024-06-04 Thread Patryk Bełzak via slurm-users
Hi, I believe that setting cores in gres.conf explicitly gives you better control over hardware configuration, I wouldn't trust slurm on that one. We have the gres.conf along with "Cores", all you have to do is proper Numa discovery (as long as your hardware has numa), and then assign correct co

[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
make sense? > > I also missed that setting in slurm.conf so good to know it is possible to > change the default behaviour. > > Tom > > From: Patryk Bełzak via slurm-users > Date: Friday, 17 May 2024 at 10:15 > To: Dj Merrill > Cc: slurm-users@lists.schedmd.co

[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
Hi, I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf: ``` * softmemlock 4086160 #Allow more Memory Locks for MPI * hardmemlock