Hello,
I’m trying to figure out why we’ve been seeing an increase in network traffic 
in our AWS-based cluster, which uses Amazon’s parallel cluster tool. After an 
incident a couple weeks ago, I turned on debug2 logging on the slurmd 
processes, and I’m seeing huge numbers of `REQUEST_GETPW` and `REQUEST_GETGR` 
requests going to the slurmd processes. I briefly turned on debug2 logging for 
`slurmctld` as well, and I’m seeing lots of RPC calls, but not as many as the 
`REQUEST_GETPW` requests that I’ve seen on compute node slurmd processes.
Here’s a sample from the slurmctld log:
```
[2025-04-28T15:11:05.436] debug2: _slurm_rpc_dump_partitions, size=1253 usec=20
[2025-04-28T15:11:05.450] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2971
[2025-04-28T15:11:05.451] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2971
[2025-04-28T15:11:05.451] debug2: _slurm_rpc_dump_partitions, size=1253 usec=16
[2025-04-28T15:11:05.461] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2788
[2025-04-28T15:11:05.461] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2788
[2025-04-28T15:11:05.461] debug2: _slurm_rpc_dump_partitions, size=1253 usec=12
[2025-04-28T15:11:05.517] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2916
[2025-04-28T15:11:05.518] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2916
[2025-04-28T15:11:05.518] debug2: _slurm_rpc_dump_partitions, size=1253 usec=14
[2025-04-28T15:11:05.628] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3405
[2025-04-28T15:11:05.629] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3405
[2025-04-28T15:11:05.629] debug2: _slurm_rpc_dump_partitions, size=1253 usec=14
[2025-04-28T15:11:05.740] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2189
[2025-04-28T15:11:05.740] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2189
[2025-04-28T15:11:05.740] debug2: _slurm_rpc_dump_partitions, size=1253 usec=15
[2025-04-28T15:11:05.845] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2209
[2025-04-28T15:11:05.846] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2209
[2025-04-28T15:11:05.846] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=4106
[2025-04-28T15:11:05.846] debug2: _slurm_rpc_dump_partitions, size=1253 usec=14
[2025-04-28T15:11:05.847] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=4106
[2025-04-28T15:11:05.847] debug2: _slurm_rpc_dump_partitions, size=1253 usec=11
[2025-04-28T15:11:05.938] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3400
[2025-04-28T15:11:05.938] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3400
[2025-04-28T15:11:05.938] debug2: _slurm_rpc_dump_partitions, size=1253 usec=18
[2025-04-28T15:11:06.903] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3449
[2025-04-28T15:11:06.904] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3449
[2025-04-28T15:11:06.904] debug2: _slurm_rpc_dump_partitions, size=1253 usec=15
[2025-04-28T15:11:07.175] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3722
[2025-04-28T15:11:07.176] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3722
[2025-04-28T15:11:07.177] debug2: _slurm_rpc_dump_partitions, size=1253 usec=254
[2025-04-28T15:11:07.205] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=4040
[2025-04-28T15:11:07.206] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=4040
[2025-04-28T15:11:07.206] debug2: _slurm_rpc_dump_partitions, size=1253 usec=17
[2025-04-28T15:11:07.237] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2990
[2025-04-28T15:11:07.238] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2990
[2025-04-28T15:11:07.239] debug2: _slurm_rpc_dump_partitions, size=1253 usec=15
[2025-04-28T15:11:07.284] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2920
[2025-04-28T15:11:07.285] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2920
[2025-04-28T15:11:07.285] debug2: _slurm_rpc_dump_partitions, size=1253 usec=15
[2025-04-28T15:11:07.370] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3236
[2025-04-28T15:11:07.371] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3236
[2025-04-28T15:11:07.371] debug2: _slurm_rpc_dump_partitions, size=1253 usec=17
[2025-04-28T15:11:08.463] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2848
[2025-04-28T15:11:08.464] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2848
[2025-04-28T15:11:08.464] debug2: _slurm_rpc_dump_partitions, size=1253 usec=14
[2025-04-28T15:11:08.691] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=2627
[2025-04-28T15:11:08.692] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=2627
[2025-04-28T15:11:08.692] debug2: _slurm_rpc_dump_partitions, size=1253 usec=18
[2025-04-28T15:11:08.873] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3729
[2025-04-28T15:11:08.874] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3729
[2025-04-28T15:11:08.875] debug2: _slurm_rpc_dump_partitions, size=1253 usec=196
[2025-04-28T15:11:08.881] debug2: Processing RPC: REQUEST_JOB_INFO_SINGLE from 
UID=3461
[2025-04-28T15:11:08.882] debug2: Processing RPC: REQUEST_PARTITION_INFO from 
UID=3461
[2025-04-28T15:11:08.882] debug2: _slurm_rpc_dump_partitions, size=1253 usec=10
```
And from slurmd:
```
[2025-04-27T19:45:01.353] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.475] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.491] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.496] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:02.497] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:02.501] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.504] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.507] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.513] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.518] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:02.606] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:02.607] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:04.988] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:04.992] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:04.995] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:04.999] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.011] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.016] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.033] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.045] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.048] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.057] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.073] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.077] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.110] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.143] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.144] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.152] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.159] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.167] [59253.batch] debug:  Handling REQUEST_GETGR
[2025-04-27T19:45:05.170] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.172] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.174] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.203] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.204] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.207] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.316] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.318] [59253.batch] debug:  Handling REQUEST_GETPW
[2025-04-27T19:45:05.321] [59253.batch] debug:  Handling REQUEST_GETPW
```
This level of debugging makes the logs pretty huge, but if seeing a whole log 
file is helpful, I can make something available.
Any ideas on next steps for figuring out what’s going on? It seems like 
something is asking for authentication a whole lot, but it’s not clear to me 
what or why. We do use munge for Slurm authentication, and SSSD to work with 
LDAP for user authentication.
-Jeremy Guillette
—

Jeremy Guillette

Software Engineer, FAS Academic Technology | Academic Technology
Harvard University Information Technology
P: (617) 998-1826 | W: huit.harvard.edu
(he/him/his)
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to