Hi,
why you think it's an authentication requests? As far as I understand multiple 
UIDs are asking for job and partition info. It's unlikely that all of them 
perform that kind of requests the same way and in the same time, so I think you 
should look for some external program that may do that - i.e. some monitoring 
tool? Or reporting tool? I'm not sure if API calls are also registered as RPC 
in controller logs.

Dirty (but maybe effective) way of discovering what makes all of that calls is 
to set the RPC rate limit to some low value and see what stopped working ;) 
https://slurm.schedmd.com/slurm.conf.html#OPT_rl_enable

Regards
Patryk.

On 25/05/06 02:38PM, Guillette, Jeremy via slurm-users wrote:
[-- Type: text/plain; charset=Windows-1252, Encoding: quoted-printable, Size: 
9.3K --]
> Hello,
> I’m trying to figure out why we’ve been seeing an increase in network traffic 
> in our AWS-based cluster, which uses Amazon’s parallel cluster tool. After an 
> incident a couple weeks ago, I turned on debug2 logging on the slurmd 
> processes, and I’m seeing huge numbers of `REQUEST_GETPW` and `REQUEST_GETGR` 
> requests going to the slurmd processes. I briefly turned on debug2 logging 
> for `slurmctld` as well, and I’m seeing lots of RPC calls, but not as many as 
> the `REQUEST_GETPW` requests that I’ve seen on compute node slurmd processes.
> Here’s a sample from the slurmctld log:
> ```
> [2025-04-28T15:11:05.436] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=20
> [2025-04-28T15:11:05.450] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2971
> [2025-04-28T15:11:05.451] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2971
> [2025-04-28T15:11:05.451] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=16
> [2025-04-28T15:11:05.461] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2788
> [2025-04-28T15:11:05.461] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2788
> [2025-04-28T15:11:05.461] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=12
> [2025-04-28T15:11:05.517] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2916
> [2025-04-28T15:11:05.518] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2916
> [2025-04-28T15:11:05.518] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=14
> [2025-04-28T15:11:05.628] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3405
> [2025-04-28T15:11:05.629] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3405
> [2025-04-28T15:11:05.629] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=14
> [2025-04-28T15:11:05.740] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2189
> [2025-04-28T15:11:05.740] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2189
> [2025-04-28T15:11:05.740] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=15
> [2025-04-28T15:11:05.845] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2209
> [2025-04-28T15:11:05.846] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2209
> [2025-04-28T15:11:05.846] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=4106
> [2025-04-28T15:11:05.846] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=14
> [2025-04-28T15:11:05.847] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=4106
> [2025-04-28T15:11:05.847] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=11
> [2025-04-28T15:11:05.938] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3400
> [2025-04-28T15:11:05.938] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3400
> [2025-04-28T15:11:05.938] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=18
> [2025-04-28T15:11:06.903] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3449
> [2025-04-28T15:11:06.904] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3449
> [2025-04-28T15:11:06.904] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=15
> [2025-04-28T15:11:07.175] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3722
> [2025-04-28T15:11:07.176] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3722
> [2025-04-28T15:11:07.177] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=254
> [2025-04-28T15:11:07.205] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=4040
> [2025-04-28T15:11:07.206] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=4040
> [2025-04-28T15:11:07.206] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=17
> [2025-04-28T15:11:07.237] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2990
> [2025-04-28T15:11:07.238] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2990
> [2025-04-28T15:11:07.239] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=15
> [2025-04-28T15:11:07.284] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2920
> [2025-04-28T15:11:07.285] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2920
> [2025-04-28T15:11:07.285] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=15
> [2025-04-28T15:11:07.370] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3236
> [2025-04-28T15:11:07.371] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3236
> [2025-04-28T15:11:07.371] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=17
> [2025-04-28T15:11:08.463] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2848
> [2025-04-28T15:11:08.464] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2848
> [2025-04-28T15:11:08.464] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=14
> [2025-04-28T15:11:08.691] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=2627
> [2025-04-28T15:11:08.692] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=2627
> [2025-04-28T15:11:08.692] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=18
> [2025-04-28T15:11:08.873] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3729
> [2025-04-28T15:11:08.874] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3729
> [2025-04-28T15:11:08.875] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=196
> [2025-04-28T15:11:08.881] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE 
> from UID=3461
> [2025-04-28T15:11:08.882] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
> UID=3461
> [2025-04-28T15:11:08.882] debug2: _slurm_rpc_dump_partitions, size=1253 
> usec=10
> ```
> And from slurmd:
> ```
> [2025-04-27T19:45:01.353] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.475] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.491] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:02.501] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.504] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.507] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.513] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:02.606] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:02.607] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:04.988] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:04.992] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:04.995] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:04.999] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.011] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.016] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.033] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.045] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.048] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.057] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.073] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.077] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.110] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.144] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETGR
> [2025-04-27T19:45:05.170] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.172] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.174] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.203] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.204] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.207] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.316] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.318] [59253.batch] debug:  Handling REQUEST_GETPW
> [2025-04-27T19:45:05.321] [59253.batch] debug:  Handling REQUEST_GETPW
> ```
> This level of debugging makes the logs pretty huge, but if seeing a whole log 
> file is helpful, I can make something available.
> Any ideas on next steps for figuring out what’s going on? It seems like 
> something is asking for authentication a whole lot, but it’s not clear to me 
> what or why. We do use munge for Slurm authentication, and SSSD to work with 
> LDAP for user authentication.
> -Jeremy Guillette
> —
> 
> Jeremy Guillette
> 
> Software Engineer, FAS Academic Technology | Academic Technology
> Harvard University Information Technology
> P: (617) 998-1826 | W: huit.harvard.edu
> (he/him/his)

[-- Alternative Type #1: text/html; charset=Windows-1252, Encoding: 
quoted-printable, Size: 20K --]

> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to