Getting back the original question - I just noticed that there is special 
option AuditRPCs in DebugFlags for controller, so perhaps you can determine 
source of RPC calls without breaking things.
https://slurm.schedmd.com/slurm.conf.html#OPT_AuditRPCs

Regards
Patryk.

On 25/05/07 10:47AM, Patryk Bełzak via slurm-users wrote:
> > IMHO the RPC rate limiting should be considered a best practice, and I
> > wouldn't think that it's a "dirty" configuration.  You need Slurm 23.02 or
> > later for this.  Some details are discussed in this Wiki page:
> Dirty in a way that levels are so low that they break some other service in 
> order to determine which service is making that calls. You know, breaking 
> things isn't the best practice ;)
> I totally agree that RPC rate limiting overall is a good practice, and 
> perhaps it should be enabled by default in SLURM.
> 
> Regards
> Patryk.
> 
> On 25/05/07 10:13AM, Ole Holm Nielsen via slurm-users wrote:
> > On 5/7/25 09:57, Patryk Bełzak via slurm-users wrote:
> > > Hi,
> > > why you think it's an authentication requests? As far as I understand 
> > > multiple UIDs are asking for job and partition info. It's unlikely that 
> > > all of them perform that kind of requests the same way and in the same 
> > > time, so I think you should look for some external program that may do 
> > > that - i.e. some monitoring tool? Or reporting tool? I'm not sure if API 
> > > calls are also registered as RPC in controller logs.
> > > 
> > > Dirty (but maybe effective) way of discovering what makes all of that 
> > > calls is to set the RPC rate limit to some low value and see what stopped 
> > > working ;) https://slurm.schedmd.com/slurm.conf.html#OPT_rl_enable
> > 
> > IMHO the RPC rate limiting should be considered a best practice, and I
> > wouldn't think that it's a "dirty" configuration.  You need Slurm 23.02 or
> > later for this.  Some details are discussed in this Wiki page:
> > 
> > https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#rpc-rate-limiting
> > 
> > IHTH,
> > Ole
> > 
> > 
> > > On 25/05/06 02:38PM, Guillette, Jeremy via slurm-users wrote:
> > > [-- Type: text/plain; charset=Windows-1252, Encoding: quoted-printable, 
> > > Size: 9.3K --]
> > > > Hello,
> > > > I’m trying to figure out why we’ve been seeing an increase in network 
> > > > traffic in our AWS-based cluster, which uses Amazon’s parallel cluster 
> > > > tool. After an incident a couple weeks ago, I turned on debug2 logging 
> > > > on the slurmd processes, and I’m seeing huge numbers of `REQUEST_GETPW` 
> > > > and `REQUEST_GETGR` requests going to the slurmd processes. I briefly 
> > > > turned on debug2 logging for `slurmctld` as well, and I’m seeing lots 
> > > > of RPC calls, but not as many as the `REQUEST_GETPW` requests that I’ve 
> > > > seen on compute node slurmd processes.
> > > > Here’s a sample from the slurmctld log:
> > > > ```
> > > > [2025-04-28T15:11:05.436] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=20
> > > > [2025-04-28T15:11:05.450] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2971
> > > > [2025-04-28T15:11:05.451] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2971
> > > > [2025-04-28T15:11:05.451] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=16
> > > > [2025-04-28T15:11:05.461] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2788
> > > > [2025-04-28T15:11:05.461] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2788
> > > > [2025-04-28T15:11:05.461] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=12
> > > > [2025-04-28T15:11:05.517] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2916
> > > > [2025-04-28T15:11:05.518] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2916
> > > > [2025-04-28T15:11:05.518] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=14
> > > > [2025-04-28T15:11:05.628] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3405
> > > > [2025-04-28T15:11:05.629] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3405
> > > > [2025-04-28T15:11:05.629] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=14
> > > > [2025-04-28T15:11:05.740] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2189
> > > > [2025-04-28T15:11:05.740] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2189
> > > > [2025-04-28T15:11:05.740] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=15
> > > > [2025-04-28T15:11:05.845] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2209
> > > > [2025-04-28T15:11:05.846] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2209
> > > > [2025-04-28T15:11:05.846] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=4106
> > > > [2025-04-28T15:11:05.846] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=14
> > > > [2025-04-28T15:11:05.847] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=4106
> > > > [2025-04-28T15:11:05.847] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=11
> > > > [2025-04-28T15:11:05.938] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3400
> > > > [2025-04-28T15:11:05.938] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3400
> > > > [2025-04-28T15:11:05.938] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=18
> > > > [2025-04-28T15:11:06.903] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3449
> > > > [2025-04-28T15:11:06.904] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3449
> > > > [2025-04-28T15:11:06.904] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=15
> > > > [2025-04-28T15:11:07.175] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3722
> > > > [2025-04-28T15:11:07.176] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3722
> > > > [2025-04-28T15:11:07.177] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=254
> > > > [2025-04-28T15:11:07.205] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=4040
> > > > [2025-04-28T15:11:07.206] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=4040
> > > > [2025-04-28T15:11:07.206] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=17
> > > > [2025-04-28T15:11:07.237] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2990
> > > > [2025-04-28T15:11:07.238] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2990
> > > > [2025-04-28T15:11:07.239] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=15
> > > > [2025-04-28T15:11:07.284] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2920
> > > > [2025-04-28T15:11:07.285] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2920
> > > > [2025-04-28T15:11:07.285] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=15
> > > > [2025-04-28T15:11:07.370] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3236
> > > > [2025-04-28T15:11:07.371] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3236
> > > > [2025-04-28T15:11:07.371] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=17
> > > > [2025-04-28T15:11:08.463] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2848
> > > > [2025-04-28T15:11:08.464] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2848
> > > > [2025-04-28T15:11:08.464] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=14
> > > > [2025-04-28T15:11:08.691] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=2627
> > > > [2025-04-28T15:11:08.692] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=2627
> > > > [2025-04-28T15:11:08.692] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=18
> > > > [2025-04-28T15:11:08.873] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3729
> > > > [2025-04-28T15:11:08.874] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3729
> > > > [2025-04-28T15:11:08.875] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=196
> > > > [2025-04-28T15:11:08.881] debug2: Processing RPC: 
> > > > REQUEST_JOB_INFO_SINGLE from UID=3461
> > > > [2025-04-28T15:11:08.882] debug2: Processing RPC: 
> > > > REQUEST_PARTITION_INFO from UID=3461
> > > > [2025-04-28T15:11:08.882] debug2: _slurm_rpc_dump_partitions, size=1253 
> > > > usec=10
> > > > ```
> > > > And from slurmd:
> > > > ```
> > > > [2025-04-27T19:45:01.353] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.475] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.491] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:02.501] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.504] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.507] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.513] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:02.606] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:02.607] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:04.988] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:04.992] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:04.995] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:04.999] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.011] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.016] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.033] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.045] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.048] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.057] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.073] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.077] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.110] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.144] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETGR
> > > > [2025-04-27T19:45:05.170] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.172] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.174] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.203] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.204] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.207] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.316] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.318] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > [2025-04-27T19:45:05.321] [59253.batch] debug:  Handling REQUEST_GETPW
> > > > ```
> > > > This level of debugging makes the logs pretty huge, but if seeing a 
> > > > whole log file is helpful, I can make something available.
> > > > Any ideas on next steps for figuring out what’s going on? It seems like 
> > > > something is asking for authentication a whole lot, but it’s not clear 
> > > > to me what or why. We do use munge for Slurm authentication, and SSSD 
> > > > to work with LDAP for user authentication.
> > > > -Jeremy Guillette
> > > > —
> > > > 
> > > > Jeremy Guillette
> > > > 
> > > > Software Engineer, FAS Academic Technology | Academic Technology
> > > > Harvard University Information Technology
> > > > P: (617) 998-1826 | W: huit.harvard.edu
> > > > (he/him/his)
> > 
> > -- 
> > slurm-users mailing list -- slurm-users@lists.schedmd.com
> > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com



> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to